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CHAPTER I 


INTRODUCTION 

1 . 1 Introductory Remarks 

The subject of this dissertation is a calculation 
algorithm for the p-q solution of the degenerate linear 
system 

Y = AX (1.1) 

where A is an mxn linear transformation matrix with Y 

and X elements of the real m-dimensional and 

n-dimensional normed linear spaces V and V with 

m n 

norms ||*!l m and ll‘ll n > respectively [49, p. 83]. After 
Frame [13], the system is said to be degenerate in that 
m ^ n or there is no exact solution X to (1.1) for a 
given A and Y . The p-q solution X of (1.1) is a 
special case of a best approximate solution of (1.1) 
when V and V are restricted, respectively, to the 
finite dimensional normed linear spaces £ p (m) and £ q (n) 
with norms 

I |Y| l p = (|Y 1 | P + •*• + |Yj P ) 1/P 
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and 


I |X| | q = (|X 1 | q + ••• + |xj q ) 1/q 

for l<p<°°, 1 < q < 00 [12], [49, pp . 87-88], where 

8, b© st approximate solution is defined as follows: 

DEFINITION 1.1 [39]: A best approximate solution of the 

equation /(X) = G is X Q if for all X , either 

a) | !/(X) - G | | > | |/(X Q ) - G | | 

or 

b) | i f(X) -Gi| = | |/(X Q ) - G| | 

and | lX | | > | | X Q | | 

This definition is similar to the fundamental definitions 
found in references [5, p. 13], [6, p. 3], [29, p. 16 ], 
[34, p. 1], [37], [50, p. 79]. 

If we let S(A,Y) be the set of all best approximate 

solutions of the equation Y = AX , then, for a given 

operator A , the set-valued operator B a which maps 

Y onto S(Y,A) will be called the norm generalized 

inverse. If we specialize V and V to £ p (m) and 

m n 

£ q (n), then we will call the p-q generalized inverse. 



This generalized inverse was suggested by P. L. Odell, 
introduced by M. Meicler [30, p. 391 , and developed by 
Meicler, Odell, and Newman [33], [36], [37]. 

Some properties of the norm generalized inverse of 
A are given in Chapter II as well as defining a norm 
generalized inverse for the norm generalized inverse 
B a of A and examining some of Its properties. The 
definition and convergence theorems for an algorithm to 
calculate the p-q generalized inverse are developed in 
Chapter III. A basic definition and some notation 
needed in the subsequent chapters will be presented 
next . 

1 , 2 A Basic Definition and Some Notation 

DEFINITION 2.1 : For any mxn matrix A and nxm matrix 

B , consider the four equations 


1. 

ABA = 

A 


(2.1) 

2. 

BAB = 

B 


(2.2) 

3. 

(BA) T 

= 

BA 

( 2 . 3 ) 

4. 

(ab) t 

a 

AB 

(2.4) 


where indicates matrix transpose . If B satisfies 

a) equation 2, then B is said to be a generalized 
inverse of A and is denoted by B = A g ; 



b) equations 1 and 2, then B is said to be a 
reflexive generalized inverse of A and is denoted by 

B = A r ; 

c) equations 1, 2, and 3, then B is said to 

be a. left weak generalized inverse of A and is denoted 

by B = A n ; 

d) equations 1, 2, and 4, then B is said to be 
a right weak generalized inverse of A and is denoted 

by B = A w ; 

e) equations 1, 2, 3, and 4, then B is said to 

be a pseudoinverse of A and is denoted by B = A + . 

The four equations were introduced by Penrose [38]. 
His notation is used for the pseudoinverse. The names 
for the inverses defined in statements a), b), and e) 

and the notation for a), b), and c) are due to Rohde [47] 

The name, weak generalized inverse, originated with 
Goldman and Zelen [15], but the left and right designa- 
tions are due to Cline [8]. The notation for d) is from 
Bouillon and Odell [3]* 

These generalized inverses will be used throughout 

the paper. 

Also used throughout are the letters I and <j> 
which are the identity matrix and the zero vector or 
matrix of zeros. Usage will indicate the order with 
I, and <j> denoting the k*k identity and k*l column 



vector of zeros if necessary. Also used is 0 for 
the null or empty set. Boldface |s| and r are used 
for the null set of the operator Q 

N(Q) = | XeV : Q(X) = <j> J 

and the range set of the operator Q 

R(Q) = | Y : Q(X) = Y ,XeV j . 

The operator Q is not necessarily linear. 

The symbol © will denote the direct sum of two 
subspaces [17, p. 24], For a matrix A , A 1 will 

denote the 1th row of A . A. will denote the 1th 

5 3 d 

column, and A 7 the element in the j th column, 1th 
row. Scalars are real numbers and are denoted by 
lower case Roman and Greek letters. For typing con- 
venience, the Greek letter epsilon (e) will be used for 
the set theory "element of" symbol C except where 
some confusion may occur with an epsilon used in limit 
proofs. In these cases, the symbol C will be used to 
denote "element of." 

When a theorem or definition is known in the 
literature, this fact will be noted by a reference after 
the statement of the theorem or definition. If a known 
proof is included for completeness, then this fact will 



be noted by a reference after the identifier "Proof" 
the identifier of a subsection of the proof. 
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CHAPTER II 


THE NORM GENERALIZED INVERSE 

2 . 1 The Metric Projection In a Finite Dimensional 
Normed Linear Space 

DEFINITION 1.1 [351* Let V be a real normed vector 
space and M be a subset. For X in V , let E (X) 
denote the set of nearest points in M to X , i,e. t 

E m (X) = | Y e M: | |X - Y| | < | |X - Z| j 

for all Z e M f 

The set-valued mapping E^ is called the me trie 
projection onto M , Let M denote the set of all 
metric projections onto subspaces of V . 

The concept of a metric projection has been 
discussed by several authors, among them Blatter and 
Morris [2], Brown [4], Cheney and Wulbert [7], Lazar, 
Morris, and Wulbert [24], and others [25], [35], [40], 
[48], [53]. 

The existence of a metric projection where E M (X) 
is unique for all elements X is given below. 

DEFINITION 1.2 [6, p. 22]: A normed linear vector 
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space is said to be strictly convex if and only if, for 
all elements of the space X and Y , 

| j X j | - | | Y| | = | | (X + Y)/2 | | =1 implies X = Y . 

THEOREM 1 . 3 [6, P. 23]: In a strictly convex normed 
linear space V a nonempty real finite dimensional 
sub space M contains a unique point closest to any 
given point of V . 


To prove the theorem it is only necessary that M 
be convex, closed, complete and contained in a finite 
dimensional subspace. We can see this by letting Y e V 


and d 

= inf 
X£M 

1 |X - Y| | 

. If 

Y £ M , 

Y itself is 

the 

unique 

point 

closest to 

Y . 

So let 

Y i M . Then 

for 


Z e M define 

s = {x : I | X-Y | I < 1 1 z-Y 1 1 ,x £ m} . 

Then 


d 


inf {||X - Y| | V 

YrM ' ; 


inf |X 
XeM 1 


Y I I : I I X - Y | | < | | Z - Y | | j- 


inf |X - 
XeS { 



inf f(S) 



where f(X) = j [ X — Y | j ; since (d < f(X) for all 
X e M) is unaffected by the requirement f(X) _< f(Z) 
when d < f(Z) . S is closed in M and thus in. V 
since M is closed. Also, S is bounded by f(Z), 

Since d = inf f(S) , there exists a monotone 
decreasing sequence of real numbers jf^ e f(S) 
such that lim f. = d . Now f. e f(S) implies there 

• , ™ i i 

1 ^- 0 ° 

are points X. e S such that f (X. ) = f. . Using the 

i ii 

standard distance function p , calculate, if n > m 


n m 


= 

X 

- X 


1 1 n 

m 

< 

1 |X 

- Y| 





f + 

f 


m 

n 

< 

2f 



m 



m 


showing that 


W 1 


is a bounded sequence. Therefore, 


there exists a convergent subsequence | X k.| having a 
limit, say X Q e M . Thus 


d = f(X Q ) = | |X 0 - Y| | 


The uniqueness follows the proof by Cheney [6, p. 23]. 

From the above remarks, notice that strict convexity 
was not required to prove existence. Uniqueness may be 
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lost without strict convexity. For example, let V = R 2 , 

llxll = max |x i ,X 2 | , M = jx:x 2 = oj . Let Y = (0,1) . 

The problem is then to minimize max |x , ll which is 

X eR'- 1 ‘ 

minimized for all points -1 < X < 1 . 

THE O REM 1.4 [37]: The following are properties of the 

metric projection mapping E = E on a subspace M 
with norm | | • | | : 

a) E(aX) = aE(X) , for any scalar a ; 

b) E 2 = E ; 

c) E(X) = X if and only if X e M ; 

d) E ( X + Y) = E ( X ) + Y for X e V , Y e M ; 

e) E(X + E ( Y )) = E ( X ) + E(Y) for all X,Y e V ; 

f) E(x - E(X)) = (p for all X e V . 

THEOREM 1.5 : In a normed linear space V , let Q be 

an operator from V into V and consider the properties : 

1. Q 2 = Q 

2 . Q(<p) = <p 

3. Q(Y - X) = Q(Y) - X for X,Y e V and 

Q (X) = X 

Q ( Y ± X) = Q(Y) ± X for X,Y e V and 
Q ( X ) = X 

Q(Y + aX) = Q(Y) + aX for X,Y e V and 
Q(X) = X and a any real scalar 


5. 
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It follows that 


a) 

if 

properties 1 

5 2 5 and 

3 

hold j 

then 

each Y e V 


can 

be written . 

as Y = 

X 

+ Z 3 

wher. 

e Q(X) = X 


and 

Q(Z) = 4 > ; 






b) 

if 

properties 1 

, 2 , and 

4 

ha Id j 

then 

a is true 


and 

' X and Z 

are a uni 

que pa 

ir ; 


c) 

if 

properties 1 

, 2, and 

5 

hold s 

then 

b is 


tru 

;e and 







U 

= {x e V : 

Q (X) = 






is 

a subspaae ; 






d) 

if 

properties 1 

5 2 and 

5 

hold and 



W 

= jz e V : 

Q(Z) = 






is 

a subspaae , 

then Q 

is a li 

near 

operator . 

Proof : 

Let 

Y e V , X 

= Q(Y) , 


Z = Y 

- X . 


a) 

We 

need only show that 






X = Q ( Y) 

= q 2 (y) 

= q(q(Y)) = Q (X ) 

and 

Q ( Z ) = Q(Y - X) 

= 


Q(Y) - X 



b) We need only show uniqueness so assume that 
I = X, + Z_ also, with X, e U , Z e W . Then 

XX 11 

using 4. we find that 

X = Q(Y) = Q(X ]L + Z 1 ) = Q(X 1 ) + Q( Z 2 ) = 

By subtraction, Z = Z . 

e) Using property 5 twice, we find that 
Q(aX + gX 2 ) = Q(aX x + ((>)+ 3X 2 = aX L + 3X 2 for any 

X ,X e U and a, 3 scalars. 

1 Z 

d) Let Y 1 ,Y 2 e V , X x = Q.(Y 1 ) , X 2 = Q(Y 2 ) 

so that Y - X ,Y 2 - X 2 e W making 

a(Y 1 - X 1 ) + 3(Y 2 - X 2 ) £ W and 

<J) = Q[a(Y x - X x ) + 3(Y 2 - X 2 ) 

= Q(aY 1 + 3Y 2 - aX x - 3X 2 ) 

= Q(aY x + 3Y 2 ) - aQ(Y 1 ) - 8Q(Y 2 ) 

showing Q is a linear operator. 

DEFINITION 1.6: a) If Q is an operator from the 

normed linear space V into itself 3 Q is called a 

2 

projection operator if Q has the properties Q = 



and Q(<|>) = $ • Let P denote the set of projections, 

b) If Q is a projection and 

Q(Y - X) - Q(Y) - X for all X,Y e V such that 
Q (X) = X , then Q is called a true projection operator. 
Let T denote the set of true projections . 

c) If Q is a projection and 

Q(Y ± X) - Q(Y) ± X when Q(X) = X , then Q is called 
a unique projection operator . Let U denote the set of 
unique projections . 

d) If Q is a projection operator and 

Q(Y + aX) » Q(Y) + aX when Q(X) = X and a is a 
scalar 3 then ■ Q is called a spatial projection oper a tor „ 
Let S denote the set of spatial projections . 

Corollary 1.7 establishes some relationships between 
the types of projection operators and the properties 
of the sets they generate. 

COROLLARY 1.7 : Let V he a normed linear space t Q a 

projection on V , Y e V , X = Q ( Y) , Z = Y - X 5 
U = jx : Q (X) = x| , and W = |z:Q(Z) = <J>|. Then 

a) if Q e T Z e W ; 

b) if Qel) ZeW and the pair of vectors 
X,Z is unique for any Y e V ; 

c) if QeSjZeW and U is a subspace ; 
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d) Q is a linear projection if and only if 
Q , I - Q e S making V = U © W • 

e) M c S ; 

f) letting L denote the set of linear projections s 
IcMcScUcTcP . 

Proof: Parts a through e are reformulations of 

Theorem 1.5 using the notation of Definition 1.6. 

f) McScUcTc P and L c S c U follow 
immediately from a through e. To show |_C M , let 
K el. A norm must be found on V so that for any 
Y e V , | | Y - KY| | < | | Y - KX| j for any X e V . 

Consider the- weighted square norm 

| |Z| | = {z T [k T K + (I - K) t (I - K)J zj 1/2 . 

This is a norm since K K is positive semi de finite , and 
if Z 7 ^ 4> 3 then either KZ -j <p or (I - K)Z f <t> 
making either Z T K T KZ > 0 or Z T (I - K) T (I - K)Z > 0 . 
Consequently we need only show that KY minimizes the 
norm for M = jX:KX = X } = |X:KY = X,Y e v} . This is 
accomplished utilizing the following steps for Y,X e V : 

1. (Y - KY) T K T K(Y - KY) = Y T (I - K) T K T K(I - K)Y 

= Y T (I - K) T K T (K - K 2 )Y 


0 



"t f" 

X "1 

2. (Y - KY ) T ( I - K) T (I - K) (Y - KY) 

= Y T (I - K) T (I - K) T (I - K) (I - K)Y 
= Y T (I - K) T (I - K) Y 

3. (Y - KX) T K T K(Y - KX) >0 = (Y - KY) T K T K(Y - KY) 

4. (Y - KX) T (I - K) T (I - K) (Y - KX) 

= Y T (I - K) T (I - K) Y 

- X T K T (I - K) T (I - K) Y 

- Y T (I - K) T (I - K)KX 

+ X T K T (I - K) T (I - K)KX 
= Y T (I - K) T (I - K) Y 
= (Y - KY) T (I - K) T (I - K) (Y - KY) 
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5. | | Y - KY 1 | 2 = (Y - KY) T (k T K + (I - K) T (I - K)j 

• (Y - KY) 

= (Y - KY) T K T K(Y - KY) 

+ (Y - KY) T (I - K) T (I - K)(Y - KY) 


< (Y - KX) T K T K(Y - KX) 

+ (Y - KX) T (I - K) T (I - K) (Y - KX) 

= (Y - KX) T |k T K + (I - K) T (I - K)) 

• (Y - KX) 

= I 1 Y - KX || 2 


One set of Interesting linear metric projections is 
defined on the range space R(A) and null space N(A) of 

a linear operator A . These projections are defined 

in terms of the generalized inverses of Definition 2.1 

in Chapter I. Early versions of this theorem were 

proved by Desoer and Whalen [10] and by Ben-Israel 

and Charnes [1]. 

TH EOREM 1 . 8 [3, p. 15]: Let A and B be mxn and 

nxm matrices 3 respectively , with A mapping into 

V and B mapping V into V . Then 
m r r 57 m n 

a) if B is a generalized inverse of A , there 


are unique subspaces U and W such that 



1 


V = R( A) © U and V = N( A) © ¥ ; 

in n 5 

b) If B is a reflexive generalized inverse t then 
V m = R( A) © N(B) , V n = R ( A) © N(B) •; 

c) if B is a left weak generalized inverse } then 
N(A) and R(B) are orthogonal ; 

d) if B is a right weak generalized inverse 3 then 
R(A) and N(B) are orthogonal; 

e) if B is the pseudoinverse of A , R(A) and 
N(B) as well as N(A) and R(B) are orthogonal. 

The next theorem obtains the metric projections onto 
R(A) and N( A) when V n = £ 2 (n) and V m = £ 2 (m) in- 
terms of the pseudoinverse A + of A . 

THEOREM 1 . 9 [37]: Let A be an mxn matrix. Then 

2 

a) if = Si (m) , the metric projection onto 

R( A) is AA + : 

.2 

b) if = £ (n) , the metric projection onto 

N( A) is (I - A + A) . 

Theorem 1.9 shows that there are some metric pro- 
jections which are also linear projections. However , all 
metric projections are not linear as can be seen from 
the example which was quoted by Newman and Odell [37] 



and attributed to Charles Anderson of Southern 
Methodist University: 

Let V = £ p (3) with 1 < p < 00 and 
M = fX:X = a(l,l,l) , a a real scalar}- . Suppose 
E , the metric projection on M , is linear; then 

/ \1/ p 

E( 1,0,0) = min (I l~a| p + |a| p + |a| p ) 

a ' 

= E( 0 , 1 , 0 ) = E(0,0,1) 

which implies 

E(3 S 0,0) = 3E (1,0,0) 

= E ( 1 , 0 ,0 ) + E(0 ,1 ,0 ) 

+ E ( 0 , 0 , 1) 

= E(l,l,l) = (1,1,1) 

by Theorem 1.4 and the linearity of E . Therefore, 
the function f(a) = ||(a - 3),a,a|| p is minimized 
uniquely for a = 1 since the £ p (3) norm is strictly 
convex for 1 < p < 00 . Since f(a) is differentiable 
for 1 < p < 00 , it must be true that 
0 = P ? (l) = -2p2 p 2 + 2p or that 2 P 2 = 1 which is 
true if and only if p = 2 . Therefore, E is linear 
if and only if p = 2 . This result suggests a lemma 
and a theorem. 



LEMMA 1.10 [371: Let M be a hyperplane contained in 
the normed linear vector space V of dimension 11 , 
then the metric projection E of V , on M is a' 
linear transformation . 

THEOREM 1.11 [37]: Consider the spaces it p (n) 5 

1 < p < 00 . por every non-null subspace M , the 
metric projection E is linear if and only if 
n < 2 or p = 2 . 

LEMMA 1.12 [4]: For any metric projection E and any 

sequence -jxi such that lim X = Y } if 
* * n-*-“ n 

z = lim E ( X ) C R(E) , then Z C E ( Y) . 
n-»-a> n 

Proof similar to that of Brown [4]: 

| |Y - 2| | = I |Y - X n + X n - Z n + Z n - Z| | 

< | | Y - X || + | | X - Z | | + | |Z - 

11 n 11 11 n n 11 ,l n 

= I | Y - Xj | + | |X n - E(X n )|| 

+ I |z n - Z| I . 

Now since E(Y) e R(E) , 


||X n - E(X n ) I ! 


< 


I |X n - E(Y) I I 
I I X n - Y I I + I |Y - E(Y) j I 



so that 


I I Y - Z | | < 2 | |X n - Y | | + | | Y - E ( Y ) | | + | I Z r - Z | | 

and since I I X - Y | | >0 , | |Z - Z| | > 0 and 

lim X = Y j lim Z = Z , then given e. , e. > 0 
n n 5 ° 1 2 

2^™>oo n ->°° 

there is an N such that if n > N , 

! 1 X n - Y I I < 
l|Z„-Y|l < s 2 

and thus 

I I Y - Z | | < 2e 1 + | |Y - E(Y)| | + e 2 . 

Now since the above is true for any e ± > e 2 > 0 then 

I I Y - Z | 1 < || Y - E(Y) | | 

which implies Z = E(Y) since Z e R(E) . 

COROLLARY 1.13 [4]: For any metric prosection E on a 

finite dimensional subspace M in a strictly convex 


space V 5 E(X) is a continuous function of X on V 



2 . 2 Properties of the Norm Generalized Inverse 


Existence, uniqueness, and properties of the norm 
generalized inverse of a mxn matrix A will be 
developed in terms of the metric projections on the 
corresponding spaces V m and V n . The norm generalized 
inverse of the norm generalized inverse of A will be 
defined and some properties established. 

THEOREM 2 . 1 [37]: For each Y e and every pair of 

strictly convex norms , there exists a unique best 
approximate solution X Q e of the system AX ~ Y . 

If E and F are the metric projections onto R(A) and 
N(A), respectively , B is the norm generalized inverse 
of A , and A g is any generalized inverse of A , 
then the solution can be written symbolically as 

X Q = B(Y) = (I - F ) A g E ( Y) 


Proof: We will show that X Q satisfies Definition 1.1 

of Chapter I. Let Y e V . Now since A is a linear 

m 

operator, R(A) is a subspace of the strictly convex 
space which implies there exists a metric projection 

E onto R(A) such that E(Y) is unique by applying 
Theorem 1.3. Let Y Q = E(Y) e R(A) . Let A g be a 


generalized inverse of A and consider X. 


A g Y. e V 
0 r 


Observe that AX 1 = AA g Y Q = Y Q , using Theorem 1.8 a) 


$ 


22 


so that X ]L is a solution to AX = Y Q . If we choose 
another generalized inverse and let X 2 = A 2 Y Q , 

then AX 2 = Y Q . Now if X 2 is such that AX 2 = Y Q , 
then A(X 1 - X 2 ) = tj> . Consequently the difference 
between any two solutions and any two generalized 
inverses is an element of N(A), and the set S of X's 
satisfying AX = Y Q is characterized as 

S = | X : X = X x - Z,Z e N(A)| . 

This is then the set of all points in which can be 

best approximate solutions to AX = Y . We must now find 
that subset of S , say S' , of minimum norm in 
V , or in other words, those X. e S ' such that 
i jxJ I <||x|| for all X e S . Using the N(A) 
characterization, we must find those Z Q e N(A) such 
that I lx - Z I! < J | X - Z I I for all Z e N (A) . 

But this is simply finding the metric projection FCX^) 
on N(A), which yields a unique Z Q since V n is 
strictly convex and N(A) is a subspace by Theorem 1.3. 
Therefore, the best approximate solution is 
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= X 1 - P(X 1 ) 

= (I - F)X ] _ 

= (I - F)A g Y 0 
= (I - F)A g E(Y) 
= B(Y ) . 


COROLLARY 2 . 2 [37]: In the notation of Theorem 2.1 the 

following properties hold : 


a) 

AF = 0 , 

the zero or null operator 

b) 

EA = A 


c) 

BE = B 


d) 

AA g E = E 


e) 

AB = E 


f) 

BA = (I - 

F) A g A 

s) 

ABA = A 


h) 

BAB = B 




COROLLARY 2,3 [37]: 
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a) 

If 

V 

n 

= A 2 (n) 

, then B = A + E 

J 

b) 

If 

V 

m 

= ii 2 (m) 

, then B = (I - 

F)A + ; ' 

c ) 

If 

V 

n 

= £ 2 (n) 

and V 

m 

11 

*=> 

to 

J 

, then 

d) 

If 

the 

rank of 

A is 

n < m , 

then F 


and B = A g E . 

The result of part c was first shown by 
Penrose [39]. The dependence of the linearity of the 
norm generalized inverse B upon the linearity of the 
metric projections E and F will be shown next. 


THEOREM 2.4 : The norm generalized inverse B of an 

mxn matrix A is linear if and only if the metric 
projections F and E are linear over V and V , 

respectively . 

Proof: If F and E are linear, then B is linear 
since B = (I - F)A g E and A g is a matrix and therefore 

linear. 


If B is linear, then for a,B scalars and 


V Y 2 £ V 


m 


B(aY 1 + BY 2 ) 


aY 1 + 3Y 2 


2 
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which becomes 

A g E(oY 1 + (3Y 2 ) - aE ( Y 1 ) - BE(Y 2 ) • 

= FA g E(aY x + BY,) - aFA g E(Y ] _) 

- BFA g E(Y„) (2.1) 

after substitution and some algebraic manipulation. 
Observing that E(aY^ + 3Y 2 ) - aE ( Y 1 ) - SE( Y 2 ) e R(A) 
and that AF = 0 by Corollary 2.2 a), we find that 

multiplication of (2.1) by A produces 

E(aY 1 + (3Y 2 ) - aE(Y 1 ) - BE(Y 2 ) = <f> m 

showing that E is linear. Including this result In 
equation (2.1), we obtain 

<j> n = F[A g E(aY ] _ + BY 2 ) - aFA g E(Y 1 ) 

- gFA g E(Y 2 ) 

showing that F is linear over R(A g ), for all 
generalized inverses of A by Theorem 2.1. Now 
consider the set of matrices defined by 


H(Z) 


A + + (I - A + A) Z 
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where Z is an arbitrary nxm matrix with the restric- 
tion that Z maps R( A) into N(A). Now AH(Z)A = A 
showing H(Z) is a generalized inverse of A so that 
F is linear over R[H(Z)] for all Z . Since Z is 
an arbitrary map from R(A) into N(A), then for each 
Y a e R(A) and X A e N(A) there is some Z so that 

U 0 


H( Z ) Y. 


A Y o + x o 


Allowing Y q to vary over R(A), we see that F is 
linear over R(A + )©N(A)=V 

n 

COROLLARY 2.5 [37]: Let = £ q (n) , V m = £ p ( m) for 

m >. 3 , n >. 3 . Then B is linear for every mxn 
matrix A if and only if p = q = 2 . 

COROLLARY 2.6 [37]: If A is an (n+l)x n matrix of 

rank n 3 then the norm generalized inverse B is 

linear . 

One property of the p-q generalized inverse when 
p = q = 2 (the 2-2g.i.) is that the q-p g.i. of the 
p-q g.i. is (A + ) = A , a property .of the symmetry 

of A and A + . An interesting problem is to define 
a norm g.i. C of a norm g.i. B of A and determine 
if C = A . In general s as was seen in Theorem 2.4 and 
Corollary 2.5 3 B is nonlinear. 



Observe that in the fundamental existence theorem. 
Theorem 2.1, the linearity of A was required to make 
N(A) and R(A) finite dimensional subspaces. That N(A) and 
R(A) be subspaces was required to associate N (A.) and R(A) 
with the finite-dimensional subspace M of Theorem 1.3. 

By the remark after Theorem 1.3, observe that the only 
properties of M required are that M be closed, 
complete, and convex (convex for uniqueness). A more 
general theorem is therefore 

For each Y e and every pair of strictly 

convex norms , there exists a best approximate 
X 0 s V n of A(X) = Y , if R( A) and N(A) 
are closed and complete subsets of and 

V n respectively . If R(A) and N(A) are also 
convex , X Q is unique. 

Consequently,' in order to answer the question posed in 
this remark, it must be determined whether R(B) and 
N(B) each satisfy the hypotheses of the theorem. 

LEMMA 2.6 : For any norm generalized inverse B of 

A , R(B) = N (P ) . 

Proof: Let Y e V and B = (I - F)A g E . Then 

m 

E ( Y) e R( A) so that E(V ) = R( A) , since E ( Y ) = Y 

m 

if Y e R( A) . Therefore, letting Z = A g E(Y) - A + E (Y) 
and noticing again that E(Y) e R(A) , 



AZ 


AA g E( Y) - AA + E( Y) 
E(Y) - E(Y) 
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since AA g and AA + are both projections onto R(A) by 
Theorem 1.8, Consequently, 

(I - F) A g E(Y) = A + E(Y) + Z - f(a + E(Y) + z) 

= A + E( Y) + Z - F(A + E(Y)) - Z 

= (I - F)A + E(Y) (2.2) 

so that 

R(B) = R ((I - F)A + E) 

= (I - F) R(A + E) 

= (I - F)R(A + ) . 

To show (I - F)R(A + ) = N(F) , first let X e R(B) . 
Then there is a Z e R(A + ) such that X = (I - F)Z and 

F(X) = F [( I - F ) ( Z )] 

= (F - F 2 ) ( z ) 


and therefore X e N(F) and N(B) C N(F) . 



Suppose X e N(F) . Then F(X) = <f> and 

n 

(I - F)(X) = X „ To show X e R(B) , it need only be 


shown that 

X e 

R(A + ) . 

Since 

X e V 

n 

, X can 

be 

written as 

X = 

L + X 2 

where 

X 1 £ N 

(A) = R(F) 

and 

X 2 e R(A + ) 

s by 

Theorem 

1.8 e 

) . If 

X = *„ • 

then 


X e R(A + ) and the lemma is complete. If X ^ <j>,_ , 

then X i R(F) , for that would imply X = F(X) = <j>. . 

n 

Therefore, in either case, X 1 = <j> n implying that 
X = X 2 e R(A + ) and that N(F) = R(B) . 

LEMMA 2.7 : For any norm generalized inverse B of 

A , N(B) = N(E) . 

Proof: Now N(B) = jz:(I - F) A g E(Z) = <$> m j . Since 

A 9 is linear and F(<j> ) = , then N(E) c N(B) . 

Let Z e N(B) . Since Z e V and E is a metric- 

m 

projection, . Z can be written as Z = + Z , where 

E(Z 0 ) = <f> m , E(Z 1 ) = Z ± , by Corollary 1.7 c). 

Thus 


<J> = B ( Z ) 

n 

* (I - F)A g E(Z 0 + z x ) 
= (I - F)A g (Z 1 ) 

= (I - F) A + ( Z x ) 
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by the properties of E and Equation (2.2). Now 

(I - F)A + (Z ) = <j> , or FA + (Z ) = A + Z is true if and 

in' 1 1 

only if A + Z 1 e N(A) is true since P is a projection 

on M(A). But A + Z e R(A + ) so that A + Z_ - <j) since 

1 I n 

V = N(A) @ R(A + ) by Theorem 1.8 e). 

XI 

Since E is a projection on R(A) and E(Z ) = Z , 

then Z z R(A) so that Z. = AA + Z. = Acf> = <j> . Thus 
1 1 1 n m 

Z = Z Q + Z x = Z Q e N (E) . Therefore, N(B) = N(E) . 

THEOREM 2.8 : For any norm generalized inverse B of 
A , N(B) and R(B) are closed sets . 

Proof: Observe that 

N(B) = N (E) 

= |z:E(Z) 

from Lemma 2.7 and 

R(.B) = N(F) 

= |z:f(z) = cj, n | 

from Lemma 2.6. Now E and F are both continuous 
by Corollary 1.13. Since the inverse image of a closed 
set is closed when the function is continuous, N(B) 




and R(B) are closed since E 1 (| < l> m |’j = N (B) and 

F_1 ({^n}) = N(B) » where -1 denotes an inverse operto 

These results can be summarized in 

THEOREM 2.9 : The norm generalized inverse C of the 

norm generalized inverse B of a matrix A always 
exists. The solution set S is a unique -point if 
p = q = 2 or n,m <_ 2 or if A is an (n+l)xn matrix 
of rank n , f i . e . , if B is linear). 

Proof: By the generalized existence theorem a best 

approximate solution Y Q to the equation 

B(X) = Y 

exists for each X e V if R(B) and N(B) are closed 

n 

and complete subsets of V n and , respectively. 

By Theorem 2.8, R(B) and N(B) are closed sets showing 
the existence of the norm generalized inverse C of 
B defined by C(X) = S where S is the set of best 

A A 

approximate solutions Y Q to B(X) = Y Q , for every 

X e V . 

n 

By the generalized existence theorem, S is a 

A. 

unique point for every X if R(B) and N(B) are convex, 
which is true, if B is linear, which is true for ail 



A if and only if p = q = 2 (when m,n >_ 3) or 
n s m <_ 2 by Corollary 2.5 or A is an (n+l)xn matrix 
of rank n by Corollary 2.6. 

For the norm generalized inverse C of the norm 
generalized inverse B of an mxn matrix A to equal 
A 5 then C must be expressible as a matrix and, as 
such, must have a unique image for each X e V n . This 
is the case if and only if B is linear. 



CHAPTER III 


CALCULATION OF THE p-q GENERALIZED INVERSE 
FOR . £ P AND Z q SPACES 

3 . 1 Preliminary Results 

Prior to defining and proving an algorithm for the 
calculation of the p-q generalized inverse 5 which is 
defined in Section 1.1, several preliminary results 
are necessary. 


THEOREM 1.1: Let V^ , V 12 , V 21 , and 1 22 he real nxn , 

nxm, mxn, and mxm matrices , respectively . Define 


1. 

2 . 

3 . 

4. 

5 . 

6 . 

7 . 

8 . 


R 


11 


I V - V V + V 
L 11 12 22 21J 


R 


12 


-R V V' 
111222 


21 


-V, V R , 
22 21 11 


R 


22 


V + + V + V R V V + 
22 2221 11 12 22 


R 


11 


V + + V + V R* V V + 
11 11 12 22 21 11 


R 


12 


4- * 

-V V R 
11 12 22 


R 


21 


* -L 

-R V V 
222111 


R 


22 


= [ V 2 2 - V 21 V + 11 V 12 ] 


33 



and let 


V 


V V 
11 12 


^21 V 22 , 


; R 


FR n R i 2 


iR 21 R 2 2 > 


R 


. * * 
,R 11 R !2 


* A 

lR 21 R 22 


Then R = V + if and only if 


1' . V 


2 1 . 


V 2l[ I “ R ll R ll] 
[ l - V 22 V 2 


V 21 B 11 


- V 2 2 V 2 2 ]) 

([ J - R ll R ll] V 


3 f . V 


21 


1 " R 11 R 11 1 = * 


■] 


i| » 




R 11 R 11 


V = d> 
12 w 


V + 
12 22 


an d R 


V if and only if 


VT,V, Jl - R_R 


11 12 


22 22 


1 - V ll V+ lll V 12 R 22 


* *+' 

7 * v I-RR = <b 
1 12 |_ 22 22J V 


[l - R 22 R 2 2 ] V 21 ' + 


f & 4 . 

R V I - V V 
,22 21 11 11 


1 * R 22 R 22l R 21 V ll, 


- \T 
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which, then implies 

• L " * [ V X1 “ V 12 V 2 2^2l] V 11 

' f ' "fr* 

+ v + V V - V V + V V v + 

11 12 L 22 21 11 12J 2111 

• *i + , 

2 n V - V V + V s=Y 
* * L 22 211112J 22 

+ V + V Tv - V V + V ] V v + 

2221 [ 11 12 22 21J 1222 

V’ [v - V V + V 1 V V + 

J 6 1 11 12 22 21J 12 V 22 

= V + V Tv - V Y + V 1 

11 12 [ 22 21 V 11 12 J 

4". V + V V _ v V + V 

22 2lL 11 12 22 2lJ 

+ • 

= V - V V* V V v + 

[22 21 11 12J 21 11 

Proof: If all eight conditions hold, then R = R by 

the uniqueness of V + , giving the equalities 1", 2", 

3", and 4" by equating corresponding submatrices of 
R and R* . Thus, it need only be shown that 

a) R = V + if and only if 1', 2 f , 3'» and 4 ! hold. 

b) R* = V 4- if and only if 5 ? s 6 f , 7' and 8 s hold. 

But the only difference between the definitions of R 
* 

and R and between the conditions 1' through 8* is an 
interchange of the symbols 1 and 2 so that using a symmet- 
rical argument we need only prove statement a) above. 
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To prove a), it -will be shown that the four 
Moore-Penrose equations of Definition 2.1e in Chapter I 

1. VRV = V 

2 , RVR = R 

3, (VR) T = VR 

4 . (RV) T = RV 

are satisfied. Therefore, consider 




S 21 = V 21 W 11^11 + V 22 n 21 V ll + V 2l“l2 V 21 + ^ 22 ^ 22" 2 


V 21 R 11 V 11 " V 22 V 22 V 21 R 11 V 11 " V 21 R 11 V 12 V 22 ¥ 2 


+ V 


22 


V + + V + V R V V + ]v 
22 22 21 n ll 12 22j 21 


V R V - V V V 
21 11 V 11 12 22 21 


+ V V V 

22 22 21 


I - R V + R V V V 

1111 11 12 22 21 


v -v +VRR +VV' 
21 21 21 11 11 22 


+ v r 

2 2 2 1 [■ 


T HR* 

nil 


V - V 
21 21 


[i - r 1iR ; 


V 


21 


I - V V 

22 22 


11 

V 


21 


+ V 22 V 22^21 L 1 “ R 11 R 11 


1 - R n R n 


and therefore S = V if and only if 

2X21 


[i - v v + v Si - 

L 22 22j 2l[ 


1 - R n R 


R 11 R+ H 




= tf> (i.e. , that 


+ V V V 
22 22 21 


1 - R xi R li 


If conditions 1' and 3* are used, observe that 


-V 2l[ I R ll R ll] + V 22 V 22 V 2l[ I R 11 R 11 


= 4> + v 


22 


I - v + V |v T r t 

22 22 12 11 


= V 

21 21 


so that conditions 1' and 3' imply S 
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V 11 R 11 V 12 + V 12 R 21 V 12 + V 11 R 12 V 22 + V 12 R 22 V 22 


V 11 R 11 V 12 “ V 12 V 22 V 21 R 11 V 12 “ V 11 R 11 V 12 V 2 2 V 22 


+ V 


V R V V V 
22 21 11 12 v 22 22 


r 2 2 ] 


V - V V 
11 12 


R 11 V 12 


V 


V 


[ V 22 + V 2 

2 2 V 2l] 

1 ~ { V n “ V 12 V 22 V 2l} R ll] V 12 V 22 V 22 

2 - [ X - R tl R ll] V 12 + [ T - R Il R ll] V 1 2 V+ 22 V 

2 - I) - «tl R ll] V 12[ I 11 - V 22 V 2 2 ] 


and therefore S 12 = V 12 if and only if 


1 ~ R li R n 


V 12 [ Z ' V 22 V 2 2 ] * * 


Using conditions 2’ and 4', 


I _ R R V + 

11 11 12 


l] V 12 + l 1 - R+ ll R ll] 


R + R V V + V 
11 11 12 22 22 


* * + - V 22 V 2 2 ] V 22 


= <j> 


which implies S = V 


12 12 ' 
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V 21 R ll' V 12 + V 2 2 R 21 V 12 + V 21 K 12 V 22 + V 22 R 22 v 22 


V 21 R 11 V 12 “ V 22 V 22 V 21 R 11 V 12 " V 21 R 11 V 12 V 2 2 * 2 2 


+ V 


22 


iv + vt 

L 22 2 


v R v V V 

22 21 11 12 22 22 


+ 

22 


I _ V V + ]v R V 
2 2 2 2 J 211112 


I I - V V + 1 V R V V + V 
L 22 22J 21 11 12 22 22 

V +II-V V + 1 V R V I - 

22 [ 22 22J 21 11 12f 

V + |I - V V + ]v R R + R V 

22 L 22 22J 21 11 11 11 


22 22 


T" T T T T 

± ” V 2 2 V 2 2J 


and therefore s 2 o = v 2 2 ^ and onl y 

\l - V„vt Jv.,R, ,R*R, ,V. _fl - V* V 1 = cj) . 
L 22 2 2J 21 11 11 11 12 L 22 22J r 


If conditions 1* and 2’ are used, then 


II - v v + ]v R R + R V |I - V + V I 

L 22 22J 21 11 11 11 12 L 22 22j 


+ T rp 

V V 

22 12 


1 - R 11 R 11 


R 


11 


T 

R R' IV* V’ 

I 11 i 1 o l no 

II 111 21 22 


I TT ^ TT^* 


= 4 > 



2 
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RVR = 



'R,, R, 


Now 


'11 


R 11 V 11 R 11 + R 12 V 21 R 11 + R 11 V 12 R 21 + R 12 V 22 R 21 


R 11 V 11 R 11 “ R 11 V 12 V 22 V 21 R 11 “ R 11 V 12 V 22 V 21 R 11 


+ R V V + V V + V R 
11 12 22 22 22 21 11 


R V - V V V R 
11 11 12 22 21 11 


R 


11 


21 


R 21 V 11 R 11 + R 22 V 21 R 11 + R 21 V 12 R 21 + R 22 V 22 R 21 


-V + V R V R + V + V R V V + V R 
22 21 11 11 11 22 21 11 12 22 21 11 


fv + V + 
L 22 2 


VRVV VR _ V V V R 
22 21 11 12 22 21 11 22 22 21 11 


-V V R 
222111 


+ I + V" 

22 21 


»] 0 

22 V 2l] R ll 

R 11 V 12] V 22[ I - V 22 V 2 2 ] V 21 R 


V - V V 
11 12 


R 


21 



i 1 


12 


R 11 V 11 R 12 + R 12 V 21 R 12 + R 11 V 12 R 22 + R 12 V 22 R 22 


• R 11 V 11 R 11 V 12’ 22 + R 1 1 V 12 V 22 V 21 R 1 1 V 12 ^2 


+ R ll ¥ 12 " R 11 V 12 V 22 V 2 2 


V + + V V R V V + 1 

22 22 21 11 12 ‘ 22 I 


- R ll V 11 " V 12 V 


+ V 
2 2 21 _ 


R 11 V 12 V 22 


+ R V 
11 12 


1 1 - v + V ] v + fi - 
L 2 2 2 2J 2 2 [ 


V 21 R 11 ¥ 12 V ; 2 


= R 


12 


: 22 " R 21 ¥ 11 R 12 + R 22 V 21 R 12 + R 21 V 12 R 22 + R 2 2 V 22 R 2: 


vt.V 1 R 1 V R V V+ 
22 21 11 11 11 12 22 


r + 

22 


- [ V 22 + ¥ 2 2 ¥ 2 1 R 1 1 ¥ 12 ¥ 2 2 J ¥ 21 R 11 ¥ 12 ¥ 

+ ([ V 22 + V 22 V 21 R 11 V 12 V 2 2 ] V 22 “ V 22 V 21 R 11 V 


v* Fi + v r v v + 

2 2 l 21 11 12 22 


= V 


22 V 21 R ll[ V ll - V 12 V 22 ¥ 2l] R -’ V -- V "- + ¥+ 


11 12 22 


V 22 + V 22 V 21 R 11 V 12 V 22 


R 


22 
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Now 


S = VR 


11 


V 11 R 11 + V 12 R 21 


V 11 R 11 - V 12 V 22 V 21 R 11 


v - v V V R 
11 12 22 21 11 


R ll R ll 


11 


21 


V 21 R 11 + V 22 R 21 


V R - V V V R 
21 11 22 22 21 11 


I - V V V R 

2222 / 21 11 


12 


V 11 R 12 + V 12 R 22 


-V R V V + + V V + + V V + V R V V + 
11 11 1 2 V 2 2 12 22 12 22 21 11 12 22 


V V + 
12 22 


V - V V V 
11 12 22 21 


R V V 
111222 


t- 


R Ii R n 


V V + 
12 22 



Consequently S = If and only If 

1 ^ <£ J- 


1 ' V 22 ¥ 22 


V 21 R 11 


* ([ J - R+ 11 R h] V 12 V 22 


which is condition 2 


22 


V 21 R 12 + V 22 R 22 


■V R V V + V„ 
21 11 12 22 22 


Y+ + y+ Y R V Y + 
22 22 21 11 12 22 


V V + 
22 22 


I _ V V + V R V V + 
2222 21111222 


and therefore S 22 = S 22 if and only if 


(t 


I - V V 

22 22 


V R V V 
21 11 12 22 , 


= [ X ' V 2 2 V 2 2 ] V 21 R 


Y V 
11 12 22 


Given conditions 2' and 4’, 


I - v„„v;„ 

22 22 


V R V V 
21 11 12 22 


■( 


1 “ R 11 R 11 V 12 V 2 2 ) ^12 V 22 


! 4 \ 4 

6 x V I V V 

Y 22/ 12 22 


= <l> 


which implies S 22 = S 22 
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Now 


S = RV 


11 


R 11 V 11 + R 12 V 21 


R 11 V 11 + R 11 V 12 V 22 V 21 


R 11 R 11 


11 


21 


R 21 V 11 + R 22 V 21 


V V R V + 

v 22 21 11 11 


V 2 


22 + V 22 V 21 R ll V 12 V 22j V 21 


V 22 V 21 " V 22 V 21 R 11 [ V ll " V 12 V 22 V 21 


■] 


V 22 V 2l[ I - R ll R+ ll' 


12 


R 11 V 12 + R 12 V 22 


R 11 V 12 R 11 V 12 V 22 V 22 


R 11 V 12L I " V 22 V 22J 



and therefore S = S 2 If and only if 


V + V 
22 21 


1 - R n R Ii 


R 11 V 12 1 “ V 22 V 22i 


which is condition 1* 


22 


R 21 V 12 + R 22 V 22 


-V V R V + 
22211112 


V 22 + V 22 V 21 R 11 V 12 V 22 V 22 


= V + V - V + V R V II - V + V 
2222 22211112 [ 22 22j 


and therefore S 22 = S 22 If and only if 


V V R V 
22 21 11 12 


f 1 ' V 22 V 2 2 ]) T - 


V V R V 

22 21 n ll V 12 


v + v 

2 2 2 2 


Given conditions 1* and 4’, 


V V R V 
22 21 11 12 


I - v* v_, 

22 22 


] = v + V |V + V 

J 22 21p2 21 


1 “ R 11 R 11 


" V 22 V 2l( V 22 * * 


= 


— c 

22 22 * 


which implies S 
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In each of the above calculations, it has been shown 
that conditions I s , 2', 3 1 > and 4’ imply that R = V + . 
Observe also that R = V + explicitly implied condition 
I s in Penrose equation 4, and condition 2* in Penrose 
equation 3. To show the implication of 3 ! , first 
premultiply 1 ! by V 2? to obtain 


v v 1 V 

22 22 2 


it 1 


R n R li 


T 

- v i 
J 22 _ 


V + V 
22 22 


V T R T 
_ 12 11 


This shows that in Penrose equation 1, S 21 = V 21 if 

and only if 



R 11 R 


+ 

11 


+ V V + V 
22 22 21 


R n R 


+ 


$ 


which is 



R ll R+ il^ 


* 


or condition 3'. All the conditions implied by R = V + 
are satisfied simultaneously (note that R = V + implies 
that condition 1* does not require condition 3* to be 
satisfied). To show the implication of 4', postmultiply 
the transpose by V 22 to obtain 


R tl R ll 


V V + V 
12 22 22 


R T v t 
11 21 


- V V 
22 


+ 

2 2_ 


V 


22 


= 



h7 

so that S = V in Penrose equation 1 if and only If 

X ^ 1 & 

♦ ' t 1 - R ll R n] V 12 + [* " R U R u] V 12 V 2 2 V 32 

’ 1 " R Il R ll] V 12 

which is condition 4’. 

DEFINITION 1.2 [22]: A real nxn matrix A is called EPr 

if and only if it satisfies the conditions : 

1. A has rank r . 

n n 

2. T X. A . = 0 if and only if T X. A 1 = 0 

X = 1 X=1 

for all real X. where A 1 -is ifce itft row and A. 

x x 

is i/ze 1th column of A . 

Condition 2 can be written in matrix notation as 

AX = <f> if and only if a t X = $ 

or that 

| X : AX = $} = |X:A T X = 

which is N ( A) = N (A T ) . If the rank Is understood, 
an EPr matrix will be referred to as an EP matrix,. 



48 


LEMMA 1.3 : If V , V 2 , and V 3 are subspaces of the 

n-dimensional real space V such that 

c n 

V = V © V_ = V, ® V. « and if V. is orthogonal 
i 2 1 3 l 

to both V 2 and V 3 , then V 2 = V 3 . 

Proof: Suppose that there is an X e V 2 . Since 

X e V , and V = V, © V 0 there are vectors 

n n 1 3 

v, e V, , v„ e V, such that X = v + v . Let 
1 1 J 3 3 J- 3 

be an orthogonal basis for V x and {b 3i }^ 
be an orthogonal basis for V 3 . Then by the orthog- 
cnality hypothesis, b^b^ = 0 ^ or ^ = j 

j = l,-**,n-r [17, p. 24, p. 34]. Therefore, there 

are constants { a ]_i }i = i and { a 3 i}i = i such that 


X 


t «!!»>!! + E “ 31 b 3 i 
1=1 1=1 


Since X e V and V = V © V , X T v - 0 for all 
2 n 1 2 1 

v, £ V, so that for j = l, a * a ,r, 

I 1 

0 = b^.X 

13 


r 

E 

i = l 


a. . b? .b . . 
li 13 li 


n-r 

E 

i=r 


T 

a,.b. .b . . 
3 1 I 3 3i 




which implies 
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X ' S 3i b 3i 6 h 


or that v 2 C V 3 * exchanging V 2 for V 3 and 

basis b . for b . in the above proof, V_ c v. 

2 X Jl "32 

which implies v 2 = V 3 * 

THEOREM 1.4 : A real nxn matrix A is EPr if and only 

« - 4 - 

if A is EPr. Further , A is EPr if and only if 
R( A) = R(A T ) . 

Proof: Note, that 


N(A + ) 


| X : A + X = 0 j- 

| X : ( I - AA + ) Z = X for some zj 

|x: (I - AA + ) ( I - AA + ) Z = X , for s ome Z | 

jx: (I - AA + )X = X J- 

jx : AA + X = 4> j, 

{x: (AA + ) T X = 4,} 

|x: A +t A t X = (j> J- 

| X : ( I - A +t A t ) (I - A +t A t )Z = X , for all zj> 
|x:A t X = oj 
N(A t ) . 



+ +T T 

Similarly by replacing A with A and A with A 
then N(A +t ) = N ( A) . Therefore, N (A) = N(A T ) if and 
only if N(A +t ) - N(A + ) , which is to, say, A is' EPr 
if and only if A + is EPr. 

For the second part, note that V , the real 

n 5 

n--dimensional vector space, is the direct sum of 

V = N(A + ) ® R ( A) 

n 

= N(A +T ) ® R(A T ) 

by Theorem 2.8e in Chapter II. Now A is EPr if and 
only if A + is EPr if and only if N(A + ) = N(A +T ) 
which implies R(A) = R(A T ) by Lemma 1.3 since R(A) 
is orthogonal to N(A + ) and R(A T ) is orthogonal to 
M(A +t ). Also by Lemma 1.3, if R(A) = R(A T ) , then 
N(A + ) = N(A T ) and A is EPr. 

COROLLARY 1.5 : Given V, R, and R as in Theorem 1.1. 

a) If V 22 is nonsingular , conditions 1' and 2' 
reduce to 3' and 4*. 

b) If V is nonsingular 3 conditions 5' and 6* 
reduce to 7’ and 8’. 

c) If R^ and V 22 are EP matrices and 

V 1 ? = V" , conditions 2' and 4' reduce to 1’ and 3’. 
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d) If R 22 and V, ^ are EP matrices and 

V = V 2 , conditions 6* and 8 s reduce to 5’ and 7'. 

e) 1/ R is nonsingular , conditions 1% 2 ’ , 

3 * , and 4* reduce to 

9 '- V 1 2 [ Z - V 22 V 22 ] * * 

10 ’- [ X * V 22 V 2 2 ] V 21 * +• 

f) If R 22 is nonsingular , conditions 5* s 6 ’ s . 

7’, and 8’ reduce to 

111 • V 2l[> - V ll V ll] * * 

12 ’’ [ J - V ll V tl] V 12 * +• 

g) J/ a), c) , and d) hold, then the eight condi- 
tions reduce to 3’, 5’, and 7’. 

h) J/ b), c) j and d) hold , then the eight condi- 
tions reduce to 1’, 3 1 , and 7’. 

1) 1/ a), c) , d) , and f) hold, then the eight 

conditions reduce to 3* and 11’. 

j) If b), c ) 5 d) j and e) hold, then the eight 
conditions reduce to 7’ and 9’. 

k) If a) j b ) , e) s and f) hold , no conditions 
exist, as this is a full rank case , 

Proof: The proofs of statements b), d) , f ) , h), and 

j) are analogous to the proofs of a), c), e), g) , and i), 
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respectively, by interchanging subscripts. Therefore, 
only statements a), c), e), g) , i), and k) will be 

proven . 

a) If V 22 is nonsingular, V 22 = V~ 2 and 
1 1 - ^ 22 ^ 22 ) = ^ so that conditions 1* and 2’ are 

simply 

11 • V 22 V 2l[ X - R lXl] ’ * 

2'. [l - Rh R ll] V 12 V 22 ' * • 

Premultiplying 1’ and postmultiplying 2' by V 22 yield 
3* and 4', respectively. 

c) If condition 3’ is transposed, the condition 

becomes 

D - - 0 

or that V 12 £ N 1 ) = N(R 11 ) by the EP hypothesis 
on R, , . Therefore, J^I - R i]_ R 11 ] V l2 = ^ which 

Is condition 4'. To show that 1’ and 2’ are equivalent, 
note that R (R^ x ) = R(R 1;L ) since R i;l is EP and 

therefore 


R ( V 21 R Ii) - R ( V 21 R ll) . 


( 1 . 1 ) 



Since V 22 is EP, ^("^ 22 ) = ^(^ 22 ) ' Prom Theorem 1,8 


of Chapter II, when X e , 


then X can be written 

+ T 


uniquely as X = U + W , where U e N |v 2 2 j = N 
W e R (v 22 ) = R (v 22 ) by Theorem 1.4. Since (i 
Is a projection onto n(v* 2 ) and (l - V^V^) Is 
jection onto N (v 22 j , 


v; 


and 


V V + 

0 9 0 0 

fat t'i'f fat 


a pro- 


1 _ v v;„ |x 
22 22; 


v 22 v 22 )(u + W) 


v v;„ u 

2 2 2 2 ; 


u 


1 - V 22 V 22> U 


1 - V 22 V 22 F 


for all X e V 


n 


so that 


( X - V 22 V 2 2 ) * ( : 


V T V +T 
22 22 


Considering 2 ! and substituting in 4® 
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showing that R ( v 2 i R ii) ^ ^ ( V 2 2 ) ' From equation (1.1) 
then R ( V 21 R ii) € ^ ( v 2 2 ) = N ( v 22 ) so that 

* - 

■ ( R 11 V 1 2 [ X - V 22 V 22 ]) T 

which is condition 1 ! if 3’ is substituted. 

e) If R 11 is nonsingular 3 R.^ = R so that 

> - R n R ii] - * 

1 - r Ii r h] * + 

and therefore conditions 3* and 4 f are automatically 
satisfied along with the left side of 1 ? and right side 
of 2'j reducing 1' and 2’ to the conditions 

1» R V [l-V + V 1 = d> 

* 11 12L 22 2 2 J T 

2 s . I - V V + 1 V R = d> 

L 22 22J 21 11 T 

which are conditions S' and 10' after premultiplication 
of 1* and postmultiplication of 2' by R * . 

g) If a) holds 3 then 1', 2®, 3" 3 and 4' are 
reduced to 3* and 4' which are further reduced to 3’ 

If c) holds. If d) holds then 5 ' , 6', 7' and 8' are 
reduced to 5,' and 7 5 . 



i) If a), c), and d) hold, then 1’, 2 ! , 3% 4 f 
are reduced to 3 ! by g) . If d) holds, then 5', 6*, 
7', and 8' are reduced to 5' and 7'. If f) holds, 
then 5’ is reduced to 11’, and 7' is automatically 
satisfied. (See e).) 


k) If v 


and R 2*2 - R 2*2 


22 

*-l 


j 


fl - V v + 

[ 22 22. 

= <j) = 

\l - V + V 
L 22 2 2 J 

- v nKi. 

= <P = 

> - vw; 

[ x - «ii r l; 

II 

-e- 

ii 

[* - R L R n' 

r * * + l 

N - R 22 R 2 2 J 


* *4- * 

7 - R 22 R 22J 


which automatically satisfies all eight conditions. 
Note then that 


y-l + y"ly p>* -1 y V " 1 

11 11 12 22 21 11 


v”^; + v" 1 v R -1 V v” 1 
22 22 21 111222 


which is the "inside-out" rule [14, pp . 45-49], 
[27, pp. 24-26], [43, p. 29]. 


The above corollary can be used to obtain 
Theorem 6 (iii) of Lewis and Newman [26], 



LEMMA 1 . 6 : 


If A is an nxn positive semide finite 
B is mxn, then 


matrix and 


n|a + B T bJ = N ( A) n N(B) . 

X e N ( A + B T B) , then (A + B T B)X 

-AX = . B T BX . 

Further since X T B T BX = (BX) T BX > 0 , 

0 = X T (A + B T B)X 

= X T AX + X T B T BX 
> 0 

so that equation (1.2) becomes -X T AX = X B BX 
is true if and only if 

X T AX = 0 

= X T B T BX 
= (BX) T BX 

since A and B T B are positive semi de finite . 


Proof: If 

or that 


<!> 


( 1 . 2 ) 


(1.3) 

which 



Consequently, (A + B T B)X = <p if and only If BX = <f> so 
that AX = 4> from (1.2). Therefore X e N(A) and 
X e N(B) which is X e N ( A) H N(B) . If 
X e N(A) O N(B) , then X e N ( A + B T B) . 

LEMMA 1 . 7 [26]: If A is positive semidefinite , then 

A is EP . 

COROLLARY 1 . 8 [26]: If A is a positive semidefinite 

m 

nxn matrix, C is an r*n matrix and A = A + C X C , 
then 

A*" = A + - A + C T (l + CA + C T ) CA + 

if and only if N ( A) C N ( C ) . 

Proof: Let V 11 = A , V 21 = C = V^ 2 , and Y 22 = -I . 
Notice that R* = A + C T C = A is positive semidefinite 
by the proof of Lemma 1.6, equation (1.3) and therefore 
A is EP as is V ]L1 by Lemma 1.7, implying that R 
is EP by Theorem 1.4. Now 


R 


22 


[ V 22 - V 21 V 11 V 12 ]' 


I + ca + c t 


T + 
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Since A + is positive semidefinite, CA + C T is positive 
semidefinite . Letting CAC T be "A" and I be "B" of 

Lemma 1.7 S we have 


r ( I + CAC T ) = n - dim N (I + CAC T ) 

= n - dim ^N(I) fl N(CAC T )j 
= n 

showing that R * 2 = - j^I + CA + C + J is nonsingular where 
r(*) stands for the rank of a matrix. Therefore, all of 
the hypotheses of Corollary 1.5 i) are satisfied showing 

that 

(a + c t c) = A + + A + C T -(i + CA + C T ) CA + 

= A^ 


if and only if 3' and 11’ hold, which are 



= 

=' <f> • 


Now 3’ holds if and only if r[i - AA + c N(C) . However, 
I-AA + is the projection onto N(A) so that 3’ holds if 


and only if N (A) C N(C) . But N (A) = N ( A) n N(C) C N(C) 



it 3 

remaining condition is 11 ' which holds if and only if 
R [( I - AA + )] c N ( C ) which is if and only if N(A) C N(C) 

DEFINITION 1.9 : For D a diagonal matrix with diagonal 

elements d 1 , d 2 , ***, d^ , denoted by 
D = diag d^. , • • • ,d R = diag ^d i j , and any real number 
r , let 

D r = diag d^djj* • • • ,d^ 

where df = d r if d. / 0 or d! = 0 if d. = 0 . 

i i J i i J i 

Observe that 


= (v 2 ) r 

D r D s = diag d£d®] = D r+S 


with equality also holding if d i = 0 for some i , 
If some diagonal elements are zero, ambiguity occurs 
for negative r . To examine this, suppose D is 
an n*n dimensional diagonal matrix such that 



Then by the above definition. 


l/d 1 ,***,l/d m ,0,*«*,0j 

so that D 1 is really the pseudoinverse of D and 
should probably be denoted by D* . This makes for 
clumsy notation," for example. 


D 1 = diag 


D 


- 1/2 


= diag l/d^ /2 , • « » , l/d^ /2 , 0 , 0 • * ,0 



1/2 


so that, with this qualification, D r will be used 
for all real r whether D is nonsingular or not. 

For the next two theorems, consider the linear 

model 


Y = AX 

A 

and the weighted least squares solution X which is 

such that 


(Y - AX) T W(Y - AX) < (Y - AX*) W(Y - AX*) 

for all other X* and positive semidefinite weight 
matrix W . Since the weight function only will appear 



in a quadratic form, W can be assumed to be symmetric. 


without loss of generality. Letting W - W ^ ^ 
be a factorization of W [16, p. 4], then 


(Y - AX) T W(Y - AX) = (Y - AX) T W (1/2) T W 1/2 (Y - AX) 


= (w 1/2 y - w 1 / 2 aJ) T 


(w 1/2 y - w 1 / 2 ax) 


and the least squares solution is 


X 


(w 1/2 a ) 


W 1/2 Y 


(* T 


W (1/2) t W 1/2 a) a t w (1/2) t w 1/2 y 


(a t wa) a t 


WY 


( 1 . 4 ; 


[ 16 ], [28], [42], [43], [44]. 

THEOREM 1.10 : Suppose Y, A, and W are partitioned as 


X 

, A = 

V 

11 

Is 

' W 1 

4 > "| 

_ Y 2. 


A 2_ 


4> 

w 2 ! 



62 


with dimensions of Y^, A^ a and as well as 

Y , A.,, and W corresponding . Further suppose that 

»S*» s *<— « 

W„ is nonsingular. Then if the weighted least squares 

/N A A 

solution is such that Y^ - A^X = <J> s then X = X^ a 

/s. 

where X is the weighted least squares solution using 
the model Y = A.^X and weights . 

Proof: Observe first that 


so that 


a t wa = ( • A 2 



A X*1 + A 2 W 2 A 2 


A I W 1 A 1 + A M 1/2)T W2 /2A 2 


A I“l A l + ( W 2 /2A 2) T ( W 2 /2A 2) 


N(A WA) = N A 


v t w A ) hn| 


W^V c N[wi /2 A, 
2 2 / \ 2 2 


by Lemma 1.6, so that Corollary 1.8 will apply to 

A^WA, and 




since W is nonsingular and therefore W is 
^ 2 

nonsingular. 
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Consequently, consider the condition 


* Y 2 - V 


Y 2 - A 2 


j( A X A xf 

+ A 2( A I W X A !) +A l] A 2 ( A I W l A l) + }[ A I W l Y l + A 2 W 2 Y 2 ] 
|xi - Kwa/a^w ; 1 + A 2 ( A X A x) a 2 a 2 ( a X a x) a 1 w 


Y 2 - A 2 


x Y x 


+ (a^aJ a^w 


2 2 


( a X a i) A 


Y 2 - A 2 X 


2 1 + A 2 ( A I W X A x) +A 2 ] A 2 ( A X A x) A I W 2 Y 2 } 

-X + 

X + A 2 ( A I WA x) +A 2 W 2 X + A 2 ( A X A x) A 2 A 2 ( A X A x) A X W X Y X 

A 2( aT 1 W X A i) +A 2 W 2 Y 2 + A 2 ( A X W 2 Y 2 ) 


A 2 ( AT X W X A x) A 2 


W - 2 X + 


A 2 ( aT xVx) + A 2 ] [ A 2( A I W X A x) +A 2 W 2 + X ] Y 2 
( A X W X A x) A 2 W 2 1 + A 2 ( A X W X A x) A 2^J Y 2 

+ T _1 + 

X X + A 2 ( A I WA x) + A 2 W 2 X + A 2 ( A X A x) A 2 A 2 ( A X A x) A I W X Y X 

A 2 ( A I W X A x) +A 2 W 2 Y 2 + A 2 ( A I W X A x) k l m 2 


( 1 . 5 ) 


Y 2 ~ A 2 


A 2 1 A X W X A x) A 2 


W- + 


M A X A x) A 2 


2 

-X 


Y 2 - V 


Xx - A 2 (aX A 1 ) +A 2 [ W 2 1 + A 2 ( A X A x) +A 2 ] [ Y - A 2 X x] 


» w 


1 - A 2 ( aT X W X A x) A 2 

[ Y 2 - Vx] • 


+ 


A 2 ( A X A x) +A 2 ] [y-aX] 


Therefore , since W 2 is nonsingular, Y 2 - A 2 X ± = 4> . 
To conclude the theorem, calculate 



< X 


(a t wa) a t wy 

{Kv/ - Kvif A 

• A 2 ( A I W l A l) 


“ 2 1 + A 2 


KViKI 


A I W 1 Y 1 + A 2 W 2 Y 2 


X. + / A T W_ A^ \ A T W Y 

1 v 1 1 1/ 

2 2 2 

. rw + 


- (a^w A ) a 1 

w" 1 + a,(a?w a ) a t 

v 1 1 1 / 2 

2 2\ 1 l n l/ n 2 

+ 


- (A^W.A. ) a 1 

w" 1 + A- (a?W A ) A T 

\ 1 1 1 / 2 

2 2\ 1 1 1/ 2 


"I 


-,“1 


AX, 
2 1 


*1 + ( A X A i) A 2 W 2 Y ; 


- ( A i w i A i) A 


-1 


w: 1 + 


A 2< A IVl) A 2 A 2 ( A 


111 


W 2 1 + A 2 


KVl) 


• A 2 W 2 Y 2 - KVl) A 
= ^ + ( A X A i) +A 2 W 2 Y 2 

- KvO^k 1 + A 2 ( A X A l) +A 2 
_ _ 

' [ A 2 ( A l W l A l) +A 2 + W 2‘ 1 ] W 2 Y 2 

' *1 + KVlf A 2 W 2 Y 2 - ( A i W i A 1 ) +A 2 W 2 Y : 


a: 


= x. 
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LEMMA 1.11: Let g = (Y - AX)/[(Y - AX) (Y - AX)] 1/2 

A 

where X is the least squares estimate for a nonweighted 
model (W = I) . Then 

a) g T g = 1 ; 

b) A T g = cf> ; 

c) for any h such that h satisfies a) and b), 

then Y T h < Y T g 

r ^ T A -1 1/2 _ A r A rr, 1/2 

d) [(Y-AX) (Y-AX)J = Y T (Y-AX)/ [( Y-AX) T ( Y-AX)] 

Proof : 

\ T _ (Y-AX) T . (Y-AX) 

Sg [(Y-AX) T (Y-AX)] 1/2 [(Y-AX) T (Y-AX)] 1/2 

= 1 . 

b) Observe that X = (A T A) A T Y or that 
(A T A)X = A T Y since A T Y C R(A T ) = R(A T A) and (A T A) (a T a) 

“T 

is the projection onto R(A A). Then 

/S A ^ 1/2 

A T g = A T (Y - AX) / [( Y - AX) T (Y - AX)] 

/N 1/2 

= (A T Y - A T AX)/[(Y - AX) T (Y - AX)] 


<f> 
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c) If A^h = <p j then h e N ( A x ) = N(A + ) , so 

that (I - AA + )h = h 3 and h can be written as 

( I-AA )Z, for Z any element of the whole space. If 
T 

h h - 1 , then h must be in the form 

h = (I - AA + )Z/j[(I - AA + )Z] T [(I - AA + )z]j> 

= (I - AA + )Z/[Z T (I - AA + ) z] . 


The Cauchy-Schwart z inequality is 


T 

u v < 


, m . 1 / 2 . m . 1/2 
(u T u) (v T v) 


so that with u = (I - AA T )Y and v = (I - AA' h )Z the 


inequality becomes 


Y T (I - AA + ) (I - AA + )Z 


Y T (I - AA + )Y 


1/2 


Z T (I - AA + )Z 


1/2 


which is 


Y t (I - AA + )Z/[z T (I - AA + ) Z 


1/2 


< [y* 


(I - AA J Y 


1/2 
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T 

Y h < 


< |jf T (I - AA + ) Y 

= Y T ( I - aa + )y/[y t (i - aa + )y] 
= Y T (Y - ax)/[y t (i - aa + )y] 


= Y T g 


since A Y = X . 


d) (Y - AX) T (Y - AX) = Y T (Y - AX) - X T A T (Y - AX) 


= Y T (Y - AX) - X T (A T Y - A x AX) 


= Y T (Y - AX) . 


Therefore 


F( Y - AX) T (Y - AX) 


(Y - AX) T (Y - AX) 
(Y - AX) T (Y - AX) 


Y (Y -AX) 


(Y - AX) T (Y - AX) 


- 1 1/2 
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LEMMA 1.12: (Bolder Inequality in matrix trace notation) : 


For real numbers a, > 0 » > 0 s I = and 

r and s such that ( 1/r) + (1/s) = 1 , then if , 

D = diagi a. ) , D. = diag{ b. ) , then 

X XX X 

/ „\l/r / a/s 

tr( Dl D 2 ) < tr(D') tr(D=) 

with equality if and only if there is a real constant 
a such that 



n 

where tr(D) = ^ d. for any diagonal matrix D , 

i = l 

n 

Proof: tr(D 1 D 2 ) = ^ a.b. 

i= 1 



with equality if and only if there is a real constant 


r 

a. 

X 


ab' 


a such that 



$ 


which is true if and only if 


for i 


,n 



aD 


s 

2 


by the discrete Holder Inequality [18, pp. 21-26]. 

LEMMA 1. 13: If a sequential process is defined by 

x = F ( v ) with F continuous . then if 

n + 1 w n 

lim x = x ; lim y n = y 

n-+<» n n -^ 00 

then F(y) = x . 

Proof: F continuous, lim x n = x , and lim y n = 

imply that for every e > 0 there are constants 


2 - 5 

and 

N 3 

such that 


a) 

if 

n 

> N x , |F(y n ) - 

F(y) | < e/3 s 

b ) 

If 

n 

=■ N 2 » K+l - 

x „l •= s/ 3 . 

c) 

if 

n 

> N , |x - x| 

< e/3 . 


Pick N = max^N 1 ,N 2 ,N 3 J so that for n > N , 

1 F(y ) - x| = |F(y) - F(y n ) + F(y n ) ~ x n + x n " x 
< | F(y) - F(y n )| + |F(y n ) - x n | + | x n 
= |F(y) - F(y„)l + |x n+1 - x n | +|x n - 



for all e so that F(y) = x since J F ( y ) — x j is inde 
pendent of n . 
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LEMMA 1.14: .The function 


F(x) = [l + (x/c)f - 1 


approaches zero more rapidly than does x when r > 1 > 
and c is a positive constant. In fact the rate is 


r-1 


Proof: For x close to zero and since s <1 ,, the 

infinite binomial series for F(x) converges, to yield 


F(x) = 1 + s ( x/c ) r + s ( s - l)(x/c) r /2 ! + 


= (x/c) 


sCx/c)*" 1 + s ( s - 1) (x/c) zr " x /2! • * • I / c 


2r-l 


so that 


lim F(x)/x 
x+0 


lim[s(x/c) r_1 + h(x 2r “ 1 ) 
x-*0 - 


= 0 


2x-i 

where h(x ) are higher order terms of the order 
2 r — 1 

x or higher. Consequently , F(x) approaches zero 

more rapidly than x and at a rate of x r_1 . 



3 . 2 Calculation of £ p Approximations , p > 2 

In this section an algorithm is presented to obtain 
the approximations to the degenerate system 

Y = AX 


which is that value (or values) of the nxl dimensional 

"'d 

vector X , say a such, that 



where X 


Is any value of X 


S(X) = diag |Y - AX 


with 


, and 

A 1 the 1th row of 


The solutions X p will be characterized by 
selecting a weighting matrix W = d.iag\w^ > oj , 
tr[w] = 1 so that X p is the solution of 


A . 



1/2 



•fa 

for all X . The solutions to (2.1) are leas 1/ S CJ Li Qi> J. © S 
solutions to 


w l/2y . w 1/2 ax 


which are 



(w 1/2 a) w 1/2 y + [i - (w 1/2 a) (w 1 / 2 a)] 
|a t wa| a t wy + [i - (w 1/2 a) (w 1 / 2 a)'Jz 


z 

( 2 . 2 ) 
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where Z is an arbitrary nxl vector [ 9 ], [271* [4l], 

[42], [44]. 

To illustrate the heuristics behind the algorithm, 
consider the following theorem. 

THEOREM 2.1 : If X p is a weighted J l p approximation, 

then it is a weighted £ r approximation for p,r > 0 . 

A 

Proof: X p is a weighted approximation if and only 

/''o\ P_r 

if for the weights W = diag^mf^ , let U = WS(X P ) , 

so that 


Q^(X P ) 


< 


tr US(X P ) 


tr 


WS(X P ) 


P-r\ 


S(X P )' 


tr (wS(X p ) P j 

tr (wS(X*) P j 
tr(uS(X*) ) 


$S JT 

for all X , so that X* is the weighted l approxi- 
mation with weights U . Notice that if r > p , U has 

. A 

points of possible singularity at Y i ~A 1 X p . If such a 
singularity occurs for some subset J of {l,*»*,m}, 

/ /v \ P - r 

set U 2 = (j> . With U 1 = WjSJx 1 *) 
to the nonzero subset, then 


now restricted 



Q P (X P ) = tr^WS(X p ) P j 

* tr(w i S l (X p ) P j 
= tr^U i S 1 (X p ) r | 

■ t>'(o i S 1 (X p ) r ) + tr(u 2 S 2 CX p ) r j 
= tr ^US (X p ) r j 

- Q*(x p ) 

sines U 2 * <f> = S 2 (x p ) if r > 0 . 

For the discussion in this paper, consider only 
£ p spaces for 1 < p < +°° . Algorithms for p = +« 
can be found in references [11], [21], [23], [31], and 
[32] and for- -» < p < 1 in [12] and for p = 1 in 
[11] and [52], The algorithm defined below is for 

2 < p < <» . 

N 

DEFINITION 2.2 : Let W-^ = diag > 0 be arbitrary 

except subject to tr [w^j = 1 . Define 
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. 2" 

1/2 / 

** 

„ k 

a . o = tr 

\ s ( x £) 

h 

w P/(P-2) 

k 


1 (P-2) /2p 


Q 2 

1/2 / 

/ tr 

W P/(P“2)' 

L W k. 

/ 

k 


(p-2)/2p 


3 


J, 


|i:w^ > oj where X p is the weighted 


least squares solution with weights W minimizing 

iC 

Q w (x) ‘ 
k 


4 . 


The algorithm 3 step by step 3 is then to 


a) 


Calculate 3 for k = 1 , 


i) X p 

ii) n 1 = tr 


V K 


iii) If n 1 = 0 , terminate the 

algorithm ; otherwise calculate 

(p-2) / 2p 


6 = tr[wF /(p " 2) 

1 L k 


iv) -cr 


b) Calculate 3 for the k th step 3 k > 1 

■ i) W, 


k+1 


ii) X k 


i'ii) n k = tr 





iv) 


v) 
vi ) 


if n k = o , 


terminate the 
algorithm ; otherwise calculate 


S k = tr 


W: 


P/(P~2) 


(p-2) /2p 


~r „ I _k _k — 1 1 . 7 

If I a - a | < e , a prede- 
termined constant , terminate the 
algorithm ,* otherwise return 
to h) i) . 


One value for W is di ag{ 1/m, • • • , 1/m} . Since the 

/ *A« v 

elements of the two diagonal matrices W and S xj: | 

K * K. f 

are nonzero, then if n = 0 , 



A 1 xf 

k 


2 


= 0 


(2.3) 


for i = 1 , 2 , • • • ,m , and since w. >0 for i e j , 

1 K 

. A 

then Y i - A x X k = 0 for al l nonzero weights. Notice 

k 

that if n k t 0 , then a is well defined, and there 

. A 

is at least one term w ± | Y i - A 1 X k | nonzero implying that 
W is well defined. Observe that if w, 1 , = 0 for 

K+l k+1 


some i , then the corresponding term 


( w kl 

Y. 

i 

- A 1 

fy) <P 

-l)/(p-2) 

is zero. This forces either 

< - 

= 0 

or 

ih - 

A i X^ | = 0 

for 

p > 2 , imp ling that 

if 

i 

W k 

= 0 

then 

w* = 0 

k + l 

. The 

algorithm is therefore 


a well-defined procedure. It will be shown that if 


n x > o 


9 
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1. 11m c k = [qW)1 

]£->CG L 

2. 11m X k = X p 
k-+=° 

3. 11m J = J 

k->°° * 

A 

where X p is the nonwelghted £ p solution on J . If 
J f (1 , • • • ,m) s then a procedure will be presented to 
increase J in a finite number of steps to {l,***,m}. 
This algorithm was developed from the algorithms of 
references [23] and [46] and appears in another version 
for L p spaces, p even, in reference [21]. 

LEMMA 2,3: If n, > 0 , then n,_ > 0 , for all k . 

Proof: By induction, since n 1 > 0 is given, assume 

that n > 0 which implies that there is a wj; > 0 

and therefore J k is nonempty. Two possibilities 

occur: either J k + 1 = J k or J k - J k + 1 is nonempty. 

The third possibility J k+1 - J k nonempty implies there 

k k + 1 

is an i such that w. = 0 , but w. > 0 , which 

i 1 i 3 

is impossible from the remarks after Definition 2.2. 

k + 1 

a) If J k+1 = J k » then n =0 implies from 
the remarks after Definition 2.2 that 


w 



A x X p , 
k + 1 


2 


0 



?9 


for all i , so that since w£ +1 > 0 for i e J 


| Y . - A 1 ^. .! - 0 

1 x k+ll 


Define the submatrices 


S x (X) = diag 


|Y i(J) - A iS ’>X| 


» ^ e J k + 1 


S.(X) = diag 


|Y 1(jJ - A i( ^>X| , i(3)CJ k+1 


w. 


k + 1. 1 


= diag 


w 


k+1 ,1 


w 


i ( j) 
k + 1 


i ( J ) e J 


k + 1 


W. 


k + 1 ,2 


diag 


w 


= w i(j) 
k + 1, 2 k+1 


Kj) t J 


k 4* X 


where 1(1) is the smallest integer in the set, i(n) is 

( J\ . \ 

X P lad), 
k + ll ' 

W k + i,2 = * ’ and since J k +i = J k > \ f2 = 4 also, 

making 

* = W k+i,i S i(*k+i) = W k,i S i(^k+i) 

* = W k + l,2 S 2^k + l) = W k,2 S 2(^k + l) * 

A 

Since X p is the least squares estimate with weights 



l s\ \ 




= tr ! 

w. s x* 

k V k 1 











\1 



<_ tr 

W. S 1 x p . 





L k \ k+1 /_ 




r / . 




/ S\ \ 

= tr 

w. -s. xP 


+ tr | 

W, „S |X? 



k + l/j 


k ,2 2\ k + 1 /_ 


= 0 


80 


which contradicts the assumption that n > 0 . 

JC 

b) If J, - J, J . is nonempty and if 

i e J. - J, . then w?; > 0 and w* , = 0 . Define 
k k+1 5 k k+1 

the submatrices 



w k,i = di ^[ w k,i = w k t: ” ’ i( J > i J t ' J k + i. 

for i(j) t J k - J k+1 and i(j) the ordering as in part 
a) with n. the number of indices not in J, - J, . . 
Define in a similar manner, for i(j) e J k - J k+1 , and 
n_ the number of indices in J - J , 

2 K JC+ X 




Prom the remarks after Definition 2.2. w, 1 , = 0 implies 

1 k+l 

* , A 

that either wp = 0 or |Y. - A A X^| =0 so that for 

1 e J k “ J k+l 5 l Y i “ Alx k^ = 0 since > 0 . 

Therefore with S and S 2 partitioned compatible 

A 

with A and A 2 5 Y 2 “ A 2 X k = 0 and 



diag 


S . 
3 


j (i) 


- A j(i) xf 


i 


e J, 


- J 




Since the diagonal elements of 2 are positive, 

W k 2 is nonsingular and the system satisfies the 
hypotheses of Theorem 1.10 implying that X^ is also 
the least squares estimate over the reduced system 
Y.^ = A n X . Therefore, similar to part a). 
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tr 


W.S 

k 


ft) 


tr 


Vi s i 


ft)' 


+ tr 


W, „S„ 
k , 2 2 


ft) 


tr 


W, 


,1 S 1 ( X l 


< tr 


W, 


/ \" 

f /^n \~ 

x p 1 

+ tr W, S (X p 

l A k + l/J 

1 k+1, 2 2\ k + 1/ 


since w k+1 2 ~ ^ ’ Observe that the diagonal elements 

of W, . were obtained from the complement of 

Jc $ 

J, - J . Since the only nonzero weights at the k th 
iteration are indexed by J k , the nonzero diagonal 
elements of W. . must also be indexed by J . 

K ^ X K t X 

If n, . = 0 , then, similar to a), 

k-fi 5 2 5 


w. |Y. - 'A^X* . 

ii k + 1 


= 0 


for all i . Since w. 1 , > 0 for i e J. _ l . , then 

k+l k+1 3 


Y. - k x Y? 


k + 1 


= 0 , implying that S 


ift«) 


0 . Thus 



/ A n \ 


/■*_ \ 

<_ tr 

w s x p 

+ tr 

W, , S Xf , 


k , 11 \ k+1/ 


k+1, 2 2 \ k+1/ 


= 0 


contradicting that n k > 0 . Therefore r t ]c+1 > 0 * 



Lemma 2.3 implies that the J k are nested; that is, 
that Jj ^ J 2 => • • * D J k d J k+1 D ••• . Also, it was 
shown that if ri, = 0 , then I Y . - A^'Xfj = 0 for all 
nonzero weights. This lemma then implies that If the 
algorithm does not exactly fit (interpolate) the vector 
X with the original weights, the algorithm will not 
exactly fit X at any subsequent step. For the full 
rank case (r(A) = m) where every mxm submatrix is of 
rank m (the Haar Condition [6, p. 7^]),nJ must 
contain at least m+1 points since any mxm submatrix of 

rank m will interpolate X . Since the J, are 

k 

nested with a finite number of elements, then 




£ 

1=1 

n 

Oh 

so that 

lim numj 

i-vco \ 

t A \ 
& '■ 

= lim num( J g ) 

£+oo * 


where num( • ) stands for the number of elements in the 
set. Therefore the limit must be attained. The 


following lemma proves that cr > 0 is a strictly 
monotonically increasing sequence so that HJ is 

K. 

nonempty , since nJ empty implies n = 0 , 

K K 

large k . 


0 


for 



LEMMA 2 ., 4 : If W R + 1 = W k , then a k + 1 = a k . Otherwise 

0 k+1 > 0 k . 

Proof: Prom equation (2.1), if W k = W k+1 , then the 

values equal the values X£ +1 . Consequently, from 

Definition 2.2, c k = a k " ! " 1 . 

Suppose then that W R f W k+1 and consider 




From Lemma 1.11 
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a) 


g T W g 
6 k+l fe 


. ( 


* * D 

Y «. A Y p 
k + 1 k+1 k+1 


)\l 


1//2 w w“ 1/2 fy : " -a* x p 

1 k+l M k + l l k + 1 k + 1 ‘k+1 


/ & ■& vs 

y -A x p 

k+1 k+1 k+1 


r & ± r > 

Y -A Y p 
l k+1 k+1 k+1 


= 1 


since Y* - A* X p = W 1/2 fY 
k+1 k+1 k+1 


AX: 


k+1, 


so that 


(w 1/2 W- 1/2 W 1/2 (y - AX P ) 
\ \+l k+1 / k+l\ k + 1 j 




AX 


k+1 


- 1/2 1/2 
as W, , is the pseudoinverse of W by the remarks 
k + 1 

after Definition 1.9. 


b) 


A X+i® 


a t w 1/2 w 1/2 w- 1/2 Y* 

k+1 k+1 k+1 k+1 


A Y-t 
k + 1 "k + 1 


[K 


+ 1 


A 


k+1 k + 1 


Y 


k + 1 


A, 


k 4 X k + l 


1 1/2 


a* t i y * 

k+l\ k+1 


A* X p 
k+l A k+l 


'k + 1 


. 

A X P 
k+1 k+1 


(< + i 


- A 


k + 1 k+1 


1 1/2 


= <P 


showing g is orthogonal with weight W to the 
space of columns of A . 


c) For any h orthogonal with weight w k+1 to 


the columns of A and h W fc -h = 1 , then defining 


* , , 1 / 2 , * 

i, . , = w h , g, , 

K + 1 k + 1 ' °k + 1 


= Wj£g , we find 


Y X^.h 
k + 1 


* rp it 

Y, ,h, , < 

k+1 k+1 “ 


* T * 
Y k+l g k+l 


Y W E 
k + l g 


by Lemma 1.11 so that g maximizes Y W, , h, for 

K + 1 

h T W L .h = 1 and A T W. , h = <J> . 
k+1 k+1 


Now consider 


h = 


W'^W, Y - Ax£ 
k+1 k\ k 


Y - AY k ~ AY k 


1/2 


which satisfies 


a) 


h T W, . h 
k + 1 


/ a \ T 

(y - AX*) 


AX U w AiA + Xh w ic +1 ( Y - < 


Y - AX 


:) t vCa( y - “*) 


= i 

since is the pseudoinverse of W k+1 by the 

remarks after Definition 1.9. 

b) Using Definition 1.9, note that the 1th 
diagonal element of w k+1 w k +i is 1, if w k + 1 


> 0 
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and 0 if , = 0 . 

k + l 

v is either 
k+X k + l. k 


Thus the it??, diagonal element of 
w k lf W k+I > 0 or 0 if 


w 


i 

k+l 


0 


Then 


a\(y - 



= 4 > 


by Lemma 1.11 d, and since the it?? element of W k (Y - AX 
is simply i^Y i - A i X“ ] , which is the 1th element of 

K k+ Aii( Y - Ai ?) lf w Li =• 0 • If < - 0 ■ then the 

corresponding elements are trivially equal for any 

• i j i A , 

value of w. 1 as wMy. - A^T^) = 0 . If w 1 > 0 

K4*l Jt ' X iC / K 

but w* = 0 , then by the remarks after 

Kvi 

. A 

Definition 2.2, Y. - A x X^ =0 so that the corresponding 

X JC 

elements are equal. Therefore 

W, .W" 1 W, (y - AX^) = W. (y - AXf) 

k+l k+l k\ k/ k\ k / 

making 


A T W - W 
k+l k+l k 


Y 



0 


X *T3 



and 
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and noting that (1/r) + (1/s) = 1 , calculate the 
following expressions using the definition of w k+1 : 



8 


a) tr 


W 


P/(P~2) 

k+1 


(P-2) /2p 


tr • 


\ s (k 


1 p/(p-l) ) (p " 2} / 2p 


tr < 


VKi 


(P-2) / ( p — 1 5 ) 1/2 


b) 


/ \ *p X / 2 

( y - k) - *30| 


= tr 




1/2 


tr K S ft) 


VK 


(P-2) /p- 


tr 


'w,.sixf <p - 2,/(p ' ii ir A 


k \ k 



= tr < 


/ a d \1p/(P“D 

vk , 


1/2 


tr < 


W.S 3? 

k k 


(P-2) /(p-1) 


1/2 
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y - v;iiM Y - ax * 


(y - AX*) 


1/2 


tr 


W 


P/CP-2) 
k + 1 


(p-2)/2p 


" tr < ! V X k, 


P/ (P-1) 


1/2 


• tr < !W k S X k 


P/(P-1) ) (P " 2) /2P 


t^<iw k ste 


p/(p-l) 


(p-1) /p 



/- \p/(p-l) 
= tr{ | S (X p j 


W: 


P / 2 ( p ~ 1 ) 


'] 


. j‘ w p/2(p-l)]| (£, - 1)/ P 


J 


= tr 


{ D i D 2 } 


(p-1) /p 


< 4 tr |D* | 1/r tr Td S ^ 1/S ^ 


(P-D /P 



trlSlxJ W k 


1 


2 J f 

p/2(p-l) 


J 


• tr[wP /(p - 2) ] 

1 1/2 


(P-2) /2(p-l) 


tr 


VK 


tr wf /lp - 2 > 


(P-1) /P 


(p-2) /2 p 



by Lemma 1.12 (Holder Inequality in matrix trace 
notation). By the same lemma, equality holds If and 
only if there is an a such that 

D r = aD s 
au 2 


or that 

W.s(xf') = aWj? /(p ~ 2) 

k \ k j k 


or 



1/2 l/(p-2) _ 

k 


(2.4) 


But then 


W 


k + l 


W, S 
k 



(P 


- 2 ) / ( p- 1 ) / 

/ tr 


W. S 

k 



(p-2) / (p-1) 


- } 




92 



= 1 . 

But equality contradicts the hypothesis of this case 
which is Vi., i W. . Consequently the inequality is 

K -r X K 

strict. 

Therefore,, using Lemma 1.11 d) and the above 

results a we have 




and using calculation c) , 
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a k+1 > 


Y \l 

[ Y - <) 

1 

Y - AX P 

[■ k j 

\( Y - AX^) 

1/2 

tr 

w p/(p™ 2) 

(p-2 ) /2p 


(y - AX^) W k ( Y - 


AX: 


i/2 4 r r w p/<p- 2 p (p - 2)/2p 

k 


= a 


( ? 


. 5 ) 


LEMMA 2,5 : Let X p be tbe best approximation to 

X on A , TAerc 


a k < C p = tr 


P' 


S X* 


1/p 


Proof: Since X p is the best least squares approxi- 

K 

mation with weights , then 



Letting 



r = p/(p - 2) 
s = p/2 



and noting that (1/r) + (1/s) = 1 , we find 


= tr 


< tr 


W, S 
k 


( £ *) 


w, s(x p 

/C 


-P ^ 

W P/(P~ 2) 

k 

L 

w P/(p-2) 

k 

1 



(P-2) /p 


(p-2) /p 


tr Dl D 2 


'tr 


W: 


P/Cp-2] 


(P-2)/p 


£ tr 


1/r 

tr 


1/s / 

/ tr 

w P /( P- 2) 


1 


L 2j 

/ 

k ; 


'(P-D /P 


= tr W: 


tr 


P/ (P-2) 


(P-2) /p 


S X 


P/2)2/P | 

1 h 


W 


p/ (p~2) 


(p-2) /p 


= tr 




s x 


2/p 


Since {a k } is bounded above and monotone, it has 
a limit. Denote this by 


* k 

a = lim a 

k ->00 


and since <f> < W < I , elementwise for all k , the 

K 

sequence {W^} is also bounded, so that there exists 
a sub-sequence of {W k ), say W k ^ > which converges, 
say to W . Also the sequence of index sets {J k ) is 



monotonically decreasing as in the remarks prior to 

Lemma 2.4 and hence converges, say to J A . Note that 

u 

J is not null since the sequence a is monotonically 
0 

* 

increasing, and J Q null would imply a = 0 . 
Furthermore, since J k i> J k + 1 for a11 k > each 
convergent sub-sequence must have the same limiting 
index set J Q . 

A 

LEMMA 2.6 : Let be the best least squares 

approximation to X with weights W . Then 
a 0 > 0 and 

lim xf = * (2.6) 

k(i)-» k(l) 0 

Proof: As was mentioned in the remarks prior to 

Lemma 2.4, there must exist a number K such that for 

all k > K , J = J A . Thus 
5 k 0 



for k > K 


where r(») is the rank of the matrix. 
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Further 


a t wa) (a t wa 


a t wa) ( a t wa 


= |a t wa| |a t waJ 
= |a t waJ |a t waJ 


so that ^A t Wa) commutes with (A T WA). Thus for any 
W such that {iiw 1 > o} = J Q , and since A T WA is 
differentiable in each w 1 , then (a t Wa) is also 
differentiable showing |a t WA ) continuous in the 
weights W [19]. Consequently, is 

continuous in the weights for k(i) > K and 
therefore 

xj,.. = (a t w, a) a t w y 

k (1) \ k (1) ) k (1) 


- ( W Mi, A )( W Mi, A f 


is continuous in the weights for k(i) > K implying 

/s 

equation (2.6). Also, Y - AXj^^ is continuous in the 

k ( i ) 

weights so that a is continuous in the weights 

by Definition 2.2. Therefore 


0 < a 1 < a* = lim a k 

k+00 


lim o k(i) 
k ( i ) ■*“ 



LEMMA 2.7: lim W, = W_ 

k^oo k 0 

Proof: lim o - o° implies 

k->oo 

X k. \ 

lim (a - o ) = 0 . Mow observing the proof of 

Lemma 2.4, it must be true that the two inequalities 
of equation (2.5) approach equality. Now the second 
inequality is a result of the Holder Inequality 



or using D 1 , D 2 , r , and s as defined in the proof, 
we have 




which further implies the equality condition 


lim i D* - ctD* 


for a real constant a , which is 


lim 




l/2„l/(p-2) 

k 


from equation (2.4). Consequently recalling that 

tr{W, } = 1 . we obtain 

JC 

-.(p-2) / (p-1) 


- W k = lim 




w, s 

k 


(*£)] 


tr 


w k s (ij)) 


(p-2) / (p-1)) ^k i 


= lim 


W, a 1/2 W* /(p_2) 
k k 


(p-2) /(P-1) 


tr < 


■1/2^1/ (p-2) 

k 


(p-2) /(p-1)) 


w 


a l/2 w (P- 1) / (P~ 2 ) 
= lim ' — — r 


(p-2) /(p-1) 


|a 1 / 2 trj[w^- 1)/(p - 2) ] 


(P-2) /(p-1)) 


= lim{W k /tr^W k J ~ w k ^ = 4> • 



s 


This implies that {W } is a Cauchy sequence , and 

.K 

since one sub-sequence converges to W Q , then 


lim W k = W Q . 

A 

LEMMA 2.8 : The least squares estimate X^ is also 

the best approximation to X with indices in 

J Q and 


a 


0 


tr 


V 




(2.7) 


where I Q = diag\(L. =1 if i e J Q ; d = 0 otherwise] 
Proof: Observe from Lemma 2.6 that a is a contin- 

uous function of the weights W k , and denote this fact 

by F(W ) = a k+1 . Let {¥ } correspond with {y } 
k k n 

and {cr k } with {x^} of Lemma 1.13. Recall that 
lim q k = a 0 and lim VL •= W. , so that F(W.) = a 0 

K O 0 

by Lemma 1.13. 


Now suppose the algorithm were restarted with W. 

; ' U 

as the initial weights. Then X^ and o° are the 
estimates for the weights W Q . For the first actual 
iteration using the weights. 




(p-2) / (p-l) 


tr 




(P“2) / (p-l) 
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with X? ^ and corresponding to these weights. 

Since the function P relating the weights at iteration 
with a k ^ is the same as for the first sequence 
s then cr^ = P(W Q ) = a Q s and therefore 
W, = W by Lemma 2. 4. Consequently 

X ^ X \J 



so that multiplying both sides by W~ (p-2) / (p-l) 

noting that ' 


w (p-2)/(p-l) w -(p-2)/{p-l) = 


and using the remarks after Definition 1.9, we have 


w l/{p-i) = x 

o 


0 s ( ; S) 


(p-2)/(p-l) ( 

tr 


w 0 s 


( J s) 


(P-2) /(p-l) 


or 



/~n\ 

(P-2) 


w 0 = 

vra 


v( x ? 



( 

L \ V /J 

and since, by 

Lemma 1 

.11 d). 

*o ls 

squares approximation 

with weights W 

equations for 

any X 

are 



(p-2) /(p-l) 


( 2 . 8 ) 


the normal 
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(AX) T W 0 (' 


Y - AX* 


xVwyV^Y - AX* 
0 0 \ 0 


= X T A* T Y* - A*J? 


x t |a* t y - a* t a*x* 


so that using equation (2/8) and multiplying out the 


denominator, we obtain 


- (“o) Ti o s ( J o) (£ ” 2, ( Y - A? o 




« a P“ 2 y « v*. 

f \ Y l ~ A XI * ( Y ! ~ A±XP 

© • 

j Y - A m xP| P ‘ 2 ( Yi - 1”£P 
1 m O' \ 1 0 



■ E 


■ A ( P-2 ) / ■ A \ . /N 

|Y. - A x£| (y. - 


iej. 


= E 'h - 


A 1 X? 


(P-1) 


sgn 


K- 


A 1 X? } A 1 X? 


iej. 


(2.9) 


E 

ie 


8X 


| Y. - A 1 X | ^ 


X = 


xP 

0 
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where sgn(x) =1 If x > 0 and -1 If x < 0 Is the 

/s 

signum function. Therefore X p minimizes 

|Y. - A i xJ P - tr[s(X) p I 0 ] 

ieJ o 

and. consequently X p is the best estimate with 

indices in J . 

0 

To establish equation (2.7), use (2.8) and 

calculate 



It should be noted here that the proof in 
reference [46] of Lemma 2.7 is incorrect and of 
Lemma 2.8 is incomplete and incorrect as stated, even 
for the considerably greater hypotheses which they 
impose upon the model. Lemma 2.7 shows that the 



1 3 

convergent sub-sequence {W^^)} is in reality the 

A 

whole sequence so that Lemma 2.6 implies lim X p = X p . 

The lemmas and this fact can be summarized for easy 
reference in the theorem which follows. 


THEOREM 2.9 : The algorithm defined in Definition 2.2 

has the following properties: 

a) J d J for all k and lim J, = J. , 

K KtI k 0 

a nonempty set 

b) lim W k = W Q , tr W Q = 1 

A A 

c) lim X p = X p , the best approximation 

on the set of nonzero weights J Q 

d) lim a k = tr 


I o s ( i o) P 


1/p 


Observe that the above theorem merely states that 

A 

X p is the best £ p estimate on J Q , not 
(l»2 , • • • ,m} = M . If J Q = M , there is no diffi- 
culty. If J / M j then J n c M s and a 0 < a 0 
u o p 

where a° is the value of a for the £ p approxi- 
mation on M , since the approximation over a submatrix 
of A will have a smaller error than an approximation 
over all of A . This suggests restarting the 

algorithm with an index set J, , d J and a 

1,1 0 
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LEMMA 2 . 10 : If Y ±ii = A i *X P (X) for some X = X t 0,1 , 

^ P 

then the equality holds for all X where X (X) is the 
least squares solution to Y = AX with weight matrix 

W( X, i* ) = (1 - X )W Q + XU(i*) 

± * 

where U(i ) = diag [UL* = 1 for 1 z M - J Q and zero 

otherwise ] . 

Proof : Partition the system of equations Y = AX into 

two parts 




where the partitions Y^ and A {1) do not include 
Y , 4 and A 1 * 3 respectively. 



This new system 


is now in the partitioned form required in Theorem 1.10. 


Equation 1.5 shows that Y 


A ( 2 )X = <$> if and only 


if Y( 2 ) ~ A ( 2 ) X i = ^ > where X is the least squares 
solution over the complete system and X is the least 


squares solution over Y 


A X with weights W 


The only requirement is that W is nonsingu 
is true if and only if X ■f 0 . Now 

X (X) = (a* w,_.a m .) A^ W Y 

1 \ (1) (1) (1) / (1) (i) (i) 

= (a t ( 1 - X)W q a) A T (1 - X ) W q Y 

' (a t w 0 a)Vw 0 y 


ular which 


if X t 1 . Since Y 


A iA X p (X 1 ) , then 


Y - A 1 X^ = 4> for all X so that Y ±A = A 1 "X P (X) 

for all X ^ 1,0 . Observe also that W (1) = $ when 

X = 1 implying that X = (A x> ‘ ) + Y . . is such that 

1 w 

•i * 

Y . . = A 1 X . 
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THEOREM 2.11: If J n is a proper subset of M , let 

o 

W 1 (X,1*) = (1 - X)W Q + AU(i*) 

where U(i*) = diag -[U = 1 for 1* e M - J Q and zero 

otherwise'} . If it is true that 

tr [u(i*)s(x P (A) )] > 0 , ( 2 . 10 ) 

for some \ 3 0 < X ± < X Q < 1 and X P (A) the least squares 

•Jt 

solution using weights W (X,i ), then the algorithm may 
be restarted with the weights . For A^ < X < X Q 3 

then 

a(A) > a° 


where 


a( A) = 


tr [w i (X 3 i*)s(x P (X)) 2 ] 
tr[w i (X,l*) P/(P-2) ] 


(P-2) /P 


If equation (2.10) holds for all i e M — Jq > then the 
best £ P approximation to X on M may be obtained 
after a finite number of restarts . 

P roof : Observe first that tr|W 0 S(X P )l is constant in 


X and that 



lim |Y ± * - A i *X P ( X) j 
X->o 


-S * Ap # 

a x p (o) 


tr 


ju(i*)S( 


^ p . 

X ) 


and is therefore bounded since X P (X) is continuous in 

X s say for 0 < X < X . If tr[U(i*)S(X p ) ] > 0 , 

m u 

let b be this bound. If tr[U( i* )S (X*J) ] = 0 , then 
restrict 0 < < X < X Q and set 


b = inf tr U ( I * ) S ( 

X <X<X n L 
1 o 


X P ( X ) ) 


Observe that since tr [U ( i* )S (X P ( X ) ) ] is nonzero for 

some X 1 3 0 < X 1 < X Q , then it is nonzero over the 

whole interval by the contrapositive of Lemma 2.10. 

Further since (A T W 1 (X s i*)A) + is continuous in X by 

the proof of Lemma 2.6; then, in turn, X P (X), S(X P (X)) 

and tr[U(i )S(X P (X))] are continuous in X implying 

that the infimum b(X ) over the closed interval [X ,X ] 

is attained and therefore b(X,) > 0 . 

1 


By Lemma 1.14, if 

x = 1/(1 - X) 

c = tr(w» /(£, - 2) ) 


r 


p/(p - 2) 
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(p - 2)/p 


then F(x) approaches zero more rapidly than x r , or 


b(X )x - tr 


w o s 


F(x) > 0 


( 2 . 11 ) 


If x is sufficiently small, which is for 0 < X < X 2 , 
say. Observe that equation (2.10) converges to zero 
slower than any positive power of x . Consequently, we 
can find a X < X 2 so that if 1 < X < min(X 0 ,X 2 ) , 
equation (2.11) becomes 


o < 


b [x] - tr[ 

V^ollU 1 + H p/tp ' 2, / tr ( 

w P/(P-2)\ 
0 ' 

-i (P-2) /p . 

J * 


[i * w 

p/(p - 2) / t r(w p/ < p - 2) ) 

j (P-2) /P 


tr (US (x p ( A) ) 2 j 

| + (1 - A)tr| 

t w o s ' 

ft) 2 ) 

1 tr| 

( w o s (*o 

) 2 ) 

[(1 - A)tr(w p/(p ' 2) ) + x p/(p_2) j 

(p-2) /p 

tr 

w P/(P-2)] 

(P-2) /p 


tr(us(x p (A)) 2 ) - (1 - A)tr(w 0 S(x p U)) ) 
(1 - A)tr(w p/(p ' 2) ) + x p/lp ' 2) ] tP 2>/P 






2 


1 0 

so that 0 >a 
solution on W Q , 


since X p is the least squares 
and therefore 




for the third inequality. The restarted sequence 
1c 

o is monotonically increasing so that it cannot 
converge to o° . It must then converge on a set 
J 0 (A) 3 J Q so that in a finite number of restarts 
the best £ p approximation on M will be attained. 

Rice [4,5] states that examples when restarts are 
required for his hypotheses are very difficult to 
construct. If a restart is necessary, one can 
determine whether a solution will occur either by 

reaching M as a nonzero index set or by the sequence 

k 

0 diverging since it will no longer be bounded above 
by £ p of Lemma 2.5. 

The case not yet discussed occurs when 

Y . . = A L *X P (X) ( 2 , 12 ) 

X * 

& 

for some X f 0,1 and some i e M - J Q . This may 
occur when either 

a) the best £ p approximation to X on M 

occurs when equation (2.12) holds. Now the 
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best £ P approximation to X on M is a 

least squares approximation with appropriate 

weights. Without loss of generality, since 

* 

(2.12) holds, let the i weight be nonzero. 
Therefore the weighted least square approximate 
(and therefore the best JL P approximate) is 
the weighted least squares approximate over 
M - {i*} by Theorem 1.10. Therefore, X P 
is the best & P approximation on M if equa- 
tion (2.12) holds for all i e M - J Q .. 
Otherwise restart until equation (2.12) holds 
for all non-zero weight indices . 

b) the approximate X P is an local best £ P 

approximation. Recall that the weight matrices 
W k are a function of the initial weights W 1 . 
It may be true that an intermediate or local 
solution which satisfies equation (2.12) exists. 
The only known method is to restart the algo- 
rithm with completely new weights. A method of 
logically choosing these weights is unknown. 

3 . 3 Calculation of the p-q Generalized Inverse , 

P,q > 2 

In this section an algorithm will be presented 


which calculates the p-q generalized inverse 



1 1 


B = (I - F)A g E 

of a matrix A » where E and P are metric projec- 
tions onto R.(A) and N ( A) (see Section 2 . 3 ) s respectively, 
for a degenerate linear model (see Section 111) 


AX 


V = A q (n) 
n 


and where Y e U . X e V , 

= A p (m) , and p,q > 2 . The best approximate 
vector B ( Y) (unique, since A is linear and £ p 
spaces are strictly convex for 1 < p,q < °° from. 
Theorem 2.1 in Chapter XI) satisfies 


1 . tr 


[s(B(Y)) 


1/p 


< tr 


S(x) ] 


-ll/p 


2. If there is an X such that 


tr 


S( B( Y) ) 


il/’p 


= tr 


S(X) : 


!/P 


then 


tr 


d (b(y) ) 


i/q 


tr 


D(X) ' 


1 1/ q 


for all X e V n and D(X) = diag , the 1th element 
of X] . 

LEMMA 3.1: E(Y) = AX P where X p is the best £ p 


approximation to X . 
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/s 

Proof: If X p is the best £ p approximation to X , 

* 

then for any X e s 



i = l 




= tr 



< 


tr 



1/ p 



Now {Y:Y = AX. X e V } = R(A) , so the above 
’ n 

A. 

inequality shows that the vector AX P is the vector 

In V which minimizes the £ p norm over all 

m 

A 

vectors in R(A). Therefore AX P is the £ p or metric 
projection of Y on R(A) , and consequently 

A 

E(Y) = AX P . 

LEMMA 3.2 : {X:AX = E(Y)> = {X:X = X p - N , N e N ( A) } 

= | X : X = (a T W 0 a) + A T W q Y - N , N e N(A)^ . 

/N. 

Proof: Observe that since AX P = E(Y) , 

X p e {X : AX = E( Y) } = P(Y) . If X e P(Y) , write 
X = X p - N 3 then 

A 

AX = AX P - AN 


E(Y) 



so that AN = 4> , and therefore N e N(A) . Now if W. 

' U 

is the diagonal matrix of weights and X p is the weighted 
least squares approximation as in Section 3.2, then 



by equation (2.2) where Z is an arbitrary nxl vector. 

A 

Let Z = <t» ; then AX P = E(Y) still, and the proof 
above holds for this case . 

Observe from Theorem 2.1 in Chapter II that the 
calculation A g E( Y) is to obtain an element X of ¥ 

n 

such that AX = E ( Y) . Clearly (a T W 0 a) + A T W 0 Y is such a 
vector with the set of all possible solutions being of 
the form (a T W 0 a) + A T W 0 Y - N , N e N(A) . 

LEMMA 3.3 : If r (MCA) ) = r , then th ere exists an 

n*r matrix C such that for any N a N(A) there 

is a Z e V such that N = CZ . 
r 

Proof: Let (b.} r be a basis for N(A). Then for 

1 i = l 

any N e N(A) , there are constants {Z.} r such 

1 i = i 


that 
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Now from the set of all possible solutions, 

X = (a t W q a) + A T W q Y - CZ , Z £ V r , select that 
X 0 = (a T W q a) + A T W q Y - CZ q such that 



for any Z e V r . But this is simply the £ p problem 
calculated in Section 3-2 with q replacing p , 


C replacing A , 


3 


Z replacing X 


and 



Therefore 


a,lg 
cal 
best 

estimate with weights , say U A 

z q = ^c T u 0 c) c t u c |a t w 0 a) A T W q Y 

and the solution 



(a t w 0 a) + a t w 0 y - c(c\c)Vo o (a t w 0 a)Yw 0 y 

(a t w 0 a) + a t w 0 y . 

The above results can be summarized in the following 
theorem. 

THEOREM 3 .4 : For C defined as above 3 

a) E(Y) = A (a T W q a) + A T W 0 Y , where W Q is the 

diagonal matrix of least squares weights associated 
with the best approximation of X in the linear 

approximation Y = AX . 

b) F ( X) = C(C T U 0 C) + C T U 0 X , where U Q is the 
diagonal matrix of least squares weights associated 


c T u n c 

U 


with, the £ q approximation of Z in the linear 
approximation X = CZ . 


c) The p-q generalized inverse B of A is 


B(Y) 


I 




( 3 . 1 ) 


where W Q is as in a) and U Q is as in b) with 

x = (a t w 0 a) + a t w q y . 


It should be observed here that B is not necessarily 
a linear operator since both U Q and W Q depend on 
the vector approximated. If U (.•) and W Q (*) denote 
this dependence , then equation (3.1) could be written 
in functional notation as 


B ( * ) 



It should also be observed that any techniques used to 
calculate the £ p and £ q approximations of Theorem 
3 - 1 * a) and b) can be used to calculate c). 
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